The largest reported data leakage as of January 2024 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.
Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.
In 2023, the number of data compromises in the United States stood at 3,205 cases. Meanwhile, over 353 million individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2022, healthcare, financial services, and manufacturing were the three industry sectors that recorded most data breaches. The number of healthcare data breaches in the United States has gradually increased within the past few years. In the financial sector, data compromises increased almost twice between 2020 and 2022, while manufacturing saw an increase of more than three times in data compromise incidents. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
With the surge in data collection and analytics, concerns are raised with regards to the privacy of the individuals represented by the data. In settings where the data is distributed over several data holders, federated learning offers an alternative to learn from the data without the need to centralize it in the first place. This is achieved by exchanging only model parameters learned locally at each data holder. This greatly limits the amount of data to be transferred, reduces the impact of data breaches, and helps to preserve the individual’s privacy. Federated learning thus becomes a viable alternative in IoT and Edge Computing settings, especially if the data collected is sensitive. However, risks for data or information leaks still persist, if information can be inferred from the models exchanged. This can e.g. be in the form of membership inference attacks. In this paper, we investigate how successful such attacks are in the setting of sequential federated learning. The cyclic nature of model learning and exchange might enable attackers with more information to observe the dynamics of the learning process, and thus perform a more powerful attack.
Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt & Whitney Rocketdyne Fernando Figueroa, NASA Stennis Space Center Abstract The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically “learns” a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to “train” and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it “learned” a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location. Introduction The J-2X rocket engine will be tested on Test Stand A-1 at NASA Stennis Space Center (SSC) in Mississippi. A team including people from SSC, NASA Ames Research Center (ARC), and Pratt & Whitney Rocketdyne (PWR) is developing a prototype end-to-end integrated systems health management (ISHM) system that will be used to monitor the test stand and the engine while the engine is on the test stand[1]. The prototype will use several different methods for detecting and diagnosing faults in the test stand and the engine, including rule-based, model-based, and data-driven approaches. SSC is currently using the G2 tool http://www.gensym.com to develop rule-based and model-based fault detection and diagnosis capabilities for the A-1 test stand. This paper describes preliminary results in applying the data-driven approach to detecting and diagnosing faults in the J-2X engine. The conventional approach to detecting and diagnosing faults in complex engineered systems such as rocket engines and test stands is to use large numbers of human experts. Test controllers watch the data in near-real time during each engine test. Engineers study the data after each test. These experts are aided by limit checks that signal when a particular variable goes outside of a predetermined range. The conventional approach is very labor intensive. Also, humans may not be able to recognize faults that involve the relationships among large numbers of variables. Further, some potential faults could happen too quickly for humans to detect them and react before they become catastrophic. Automated fault detection and diagnosis is therefore needed. One approach to automation is to encode human knowledge into rules or models. Another approach is use data-driven methods to automatically learn models from historical data or simulated data. Our prototype will combine the data-driven approach with the model-based and rule-based appro
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
2768 Global exporters importers export import shipment records of Helium leak detector with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article aimed to summarize the information on "machine learning for water leak detection", for this a search was carried out in the database: scopus, web of science, sciencedirect, Springer Link, finding inclusion and exclusion criteria of 50 articles. The main results were: Which technologies are currently used for the detection of leaks from drinking water pipes, with an assessment of 51% of the articles; Which Machine Learning techniques are most effective for detecting leaks in drinking water distribution systems; what factors influence the accuracy of Machine Learning models for the detection of leaks in water networks; Which countries are currently leading research in the application of Machine Learning for leak detection in drinking water infrastructure and finally the key points were which terms with key concepts usually appear in research on the use of Machine Learning in the management of leaks in drinking water systems. In conclusion, the current ML technologies for the detection of pipe leaks is very backward in many countries, even in some there is no such system, which should be more research in countries where these leakage errors are very frequently shown
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
264 Global exporters importers export import shipment records of Leak detector and Hsn Code 3403 with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
20 to 30% of drinking water produced is lost due to leaks in water distribution pipes. In times of water scarcity, losing so much treated water comes at a significant cost, both environmentally and economically. In this paper, we propose a hybrid leak localization approach combining both model-based and data-driven modeling. Pressure heads of leak scenarios are simulated using a hydraulic model, and then used to train a machine-learning based leak localization model. A key element of our approach is that discrepancies between simulated and measured pressures are accounted for using a dynamically calculated bias correction, based on historical pressure measurements. Data of in-field leak experiments in operational water distribution networks were produced to evaluate our approach on realistic test data. Two problematic settings for leak localization were examined. In the first setting, an uncalibrated hydraulic model was used. In the second setting, an extended version of the water distribution network was considered, where large parts of the network were insensitive to leaks. Our results show that the leak localization model is able to reduce the leak search region in parts of the network where leaks induce detectable drops in pressure. When this is not the case, the model still localizes the leak but is able to indicate a higher level of uncertainty with respect to its leak predictions.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Pipelines transport natural gas (NG) in all stages between production and the end user. The NG composition, pipeline depth, and pressure vary significantly between extraction and consumption. As methane (CH4Â), the primary component of NG is both explosive and a potent greenhouse gas, NG leaks from underground pipelines pose both a safety and environmental threat. Leaks are typically found when an observer detects a CH4 enhancement as they pass through the downwind above-ground NG plume. The likelihood of detecting a plume depends, in part, on the size of the plume, which is contingent on both environmental conditions and intrinsic characteristics of the leak. To investigate the effects of leak characteristics, this study uses controlled NG release experiments to observe how the above-ground plume width changes with changes in the gas composition of the NG, leak rate, and depth of the subsurface emission. Results show that plume width generally decreases when heavier hydrocarbons are pr...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
About The Nauru Files contain the largest set of documents published from inside Australia's immigration detention system. Leaked to The Guardian in 2016, they include nearly 2,000 incident reports from the Nauru detention centre, which were written by guards, caseworkers and teachers on the remote Pacific island. Summary Examples of events include assaults, injuries, abuse and other forms of violence reported at the detention centre between 2013 and 2015. As noted by The Guardian, as well as academic research, Australia has privatised its immigration detention centres and exported detention of asylum seekers offshore to places such as Nauru and Manus Island in Papua New Guinea. This strategy is part of a wider "Pacific Solution" implemented by the Government of Australia since the early 2000s as a hardline deterrent to "stop the boats." Effectively, asylum seekers intercepted and detained on Nauru are removed from access to Australia's asylum system. Data Structure These data are composed of incident reports. An incident report is a short summary of an event in the Nauru detention centre written by staff there. Some of the details found in the files may be triggering; we therefore advise caution with reading and analysing these data. According to The Guardian, these reports form part of the Government of Australia's requirements to document what is happening within its detention system. Each report holds detailed information of the incident at the detention centre along with a "summary log". Working with The Guardian, we have organised these data into two forms: a PDF of each incident report, sorted by name at the time of leak, and a CSV/JSON of all incident reports (see "nauru_files.csv/json"), which structures key details into variables within its columns. Examples of variables include time, incident type, severity and description. Combined, these form a structured database linking each incident report to these variables. Data Source The Guardian has modified the original, leaked data to remove any personally-identifying information within them. To achieve this, a stringent approach of redaction has been implemented to remove names of asylum seekers and staff, personal identification numbers of asylum seekers, signatures of detention staff, nationalities within small population groups and residential tent numbers, among other things. There are also a large number of acronyms used in these data. For your convenience, we have provided an RTF document with a listing of these acronyms and their meanings. If you use these data, please cite the original source at The Guardian: The Guardian. (10 August 2016). The Nauru Files: The lives of asylum seekers in detention detailed in a unique database. Retrieved from https://www.theguardian.com/australia-news/ng-interactive/2016/aug/10/the-nauru-files-the-lives-of-asylum-seekers-in-detention-detailed-in-a-unique-database-interactive. Should you have any comments, questions or requested edits or extensions to the Nauru files, please contact Haven at kira.williams@utoronto.ca. For more articles from The Guardian on these data, see: The Nauru files: cache of 2,000 leaked reports reveal scale of abuse of children in Australian offshore detention. A short history of Nauru, Australia’s dumping ground for refugees. ‘I want death’: Nauru files chronicle despair of asylum seeker children.
http://opensource.org/licenses/BSD-2-Clausehttp://opensource.org/licenses/BSD-2-Clause
Python code (for Python 3.9 & Pandas 1.3.2) to generate the results used in "Compromised through Compression: Privacy Implications of Smart Meter Traffic Analysis".Smart metering comes with risks to privacy. One concern is the possibility of an attacker seeing the traffic that reports the energy use of a household and deriving private information from that. Encryption helps to mask the actual energy measurements, but is not sufficient to cover all risks. One aspect which has yet gone unexplored — and where encryption does not help — is traffic analysis, i.e. whether the length of messages communicating energy measurements can leak privacy-sensitive information to an observer. In this paper we examine whether using encodings or compression for smart metering data could potentially leak information about household energy use. Our analysis is based on the real-world energy use data of ±80 Dutch households.We find that traffic analysis could reveal information about the energy use of individual households if compression is used. As a result, when messages are sent daily, an attacker performing traffic analysis would be able to determine when all the members of a household are away or not using electricity for an entire day. We demonstrate this issue by recognizing when households from our dataset were on holiday. If messages are sent more often, more granular living patterns could likely be determined.We propose a method of encoding the data that is nearly as effective as compression at reducing message size, but does not leak the information that compression leaks. By not requiring compression to achieve the best possible data savings, the risk of traffic analysis is eliminated.This code operates on the relative energy measurements from the "Zonnedael dataset" from Liander N.V. This dataset needs to be obtained separately; see instructions accompanying the code. The code transforms the dataset into absolute measurements such as would be taken by a smart meter. It then generates batch messages covering 24-hour periods starting at midnight, similar to how the Dutch infrastructure batches daily meter readings, in the different possible encodings with and without compression applied. For an explanation of the different encodings, see the paper. The code will then provide statistics on the efficiency of encoding and compression for the entire dataset, and attempt to find the periods of multi-day absences for each household. It will also generate the graphs in the style used in the paper and presentation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Search Strategy for the MEDLINE/PubMed Database.
Replication Data and Code for "Incentives and Information in Methane Leak Detection and Repair" Abstract: Capturing leaked methane can be a win for both firms and the environment. However, leakage volume uncertainty can be a barrier inhibiting leak repair. We study an experiment at oil and gas production sites which randomized whether site operators were informed of methane leakage volumes. At sites with high baseline leakage, we estimate a negative but imprecise effect of information on endline emissions. But at sites with zero measured leakage, giving firms information about methane leakage increased emissions at endline. Our results suggest that giving firms news of low leakage disincentivizes maintenance effort, thereby increasing the likelihood of future leaks. Package includes data from Wang et al. (2024) RCT as well as IEA data on estimated methane emissions and methane abatement costs. Package also includes code for replication.
These data files are related to the work titled "A cooperative model to lower cost and increase the efficiency of methane leak inspections at oil and gas sites." The abstract of the work: Methane is a potent greenhouse gas that tends to leak from equipment at oil and gas (O&G) sites. The process of locating and repairing fugitive methane emissions is known as leak detection and repair (LDAR). Conventional LDAR methods are labor intensive and costly because they involve time-consuming close-range, component-level inspections at each site. This has prompted duty holders to examine new methods and strategies that could be more cost-effective. We examined a co-operative model in which multiple duty holders of O&G sites in a region use shared services to complete leak inspections. This approach was hypothesized to be more efficient and cost-effective than independent inspection programs by each duty holder in the region. To test this hypothesis, we developed a geospatial simulation model using empirical data from 11 O&G-producing regions in Canada and the USA. We used the model to compare labor cost, transit time, mileage, vehicle emissions, and driving risk between independent and co-op leak inspection programs. The results indicate that co-op leak inspection programs can generate relative savings in labor costs (1.8–34.2%), transit time (0.6–38.6%), mileage (0.2–43.1%), vehicle emissions (0.01–4.0 tCO2), and driving risk (1.9–31.9%). The largest relative savings and efficiency gains resulting from co-op leak inspection programs were in regions with a high diversity of duty holders, which was confirmed with simulations of artificial O&G sites and road networks spanning diverse conditions. We also found reducing leak inspection time by 75% with streamlined methods can additionally reduce labor cost 8.8–41.1%, transit time 5.6–20.2%, and mileage 2.60–34.3% in co-op leak inspection programs. Overall, this study demonstrates that co-op leak inspection programs can be more efficient and cost-effective, particularly in regions with a large diversity of O&G duty holders, and that methods to reduce leak inspection time can create additional savings.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The largest reported data leakage as of January 2024 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.
Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.