In 2023, the number of data compromises in the United States stood at 3,205 cases. Meanwhile, over 353 million individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2022, healthcare, financial services, and manufacturing were the three industry sectors that recorded most data breaches. The number of healthcare data breaches in the United States has gradually increased within the past few years. In the financial sector, data compromises increased almost twice between 2020 and 2022, while manufacturing saw an increase of more than three times in data compromise incidents. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains results of a genome-wide association study of back pain. Two files contain association summary statistics for discovery GWAS based on the analysis of 350,000 white British individuals from the UK Biobank and meta-analysis GWAS based on the meta-analysis of the same 350,000 individuals and additional 103,862 individuals of European Ancestry from the UK biobank (total N = 453,862). The phenotype of back pain was defined by the answer provided by the UK biobank participants to the following question: "Pain type(s) experienced in last month". Those who reported “Back pain”, were considered as cases, all the rest were considered as controls. Individuals who did not reply or replied: "Prefer not to answer" or "Pain all over the body" were excluded. This dataset is also available for graphical exploration in the genomic context at http://gwasarchive.org.
The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilisation of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant. This research has been conducted using the UK Biobank Resource and the use of the data is guided by the principles formulated by the UK Biobank.
When using downloaded data, please cite corresponding paper and this repository:
Funding:
This study was supported by the European Community’s Seventh Framework Programme funded project PainOmics (Grant agreement # 602736).
The research has been conducted using the UK Biobank Resource (project # 18219).
The development of software implementing SMR/HEIDI test and database for GWAS results was supported by the Russian Ministry of Science and Education under the 5-100 Excellence Program”.
Dr. Suri’s time for this work was supported by VA Career Development Award # 1IK2RX001515 from the United States (U.S.) Department of Veterans Affairs Rehabilitation Research and Development Service. The contents of this work do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.
Dr. Tsepilov’s time for this work was supported in part by the Russian Ministry of Science and Education under the 5-100 Excellence Program.
Column headers - discovery (350K)
Column headers - meta-analysis (450K)
Update September 20, 2021: Data and overview updated to reflect data used in the September 15 story Over Half of States Have Rolled Back Public Health Powers in Pandemic. It includes 303 state or local public health leaders who resigned, retired or were fired between April 1, 2020 and Sept. 12, 2021. Previous versions of this dataset reflected data used in the Dec. 2020 and April 2021 stories.
Across the U.S., state and local public health officials have found themselves at the center of a political storm as they combat the worst pandemic in a century. Amid a fractured federal response, the usually invisible army of workers charged with preventing the spread of infectious disease has become a public punching bag.
In the midst of the coronavirus pandemic, at least 303 state or local public health leaders in 41 states have resigned, retired or been fired since April 1, 2020, according to an ongoing investigation by The Associated Press and KHN.
According to experts, that is the largest exodus of public health leaders in American history.
Many left due to political blowback or pandemic pressure, as they became the target of groups that have coalesced around a common goal — fighting and even threatening officials over mask orders and well-established public health activities like quarantines and contact tracing. Some left to take higher profile positions, or due to health concerns. Others were fired for poor performance. Dozens retired. An untold number of lower level staffers have also left.
The result is a further erosion of the nation’s already fragile public health infrastructure, which KHN and the AP documented beginning in 2020 in the Underfunded and Under Threat project.
The AP and KHN found that:
To get total numbers of exits by state, broken down by state and local departments, use this query
KHN and AP counted how many state and local public health leaders have left their jobs between April 1, 2020 and Sept. 12, 2021.
The government tasks public health workers with improving the health of the general population, through their work to encourage healthy living and prevent infectious disease. To that end, public health officials do everything from inspecting water and food safety to testing the nation’s babies for metabolic diseases and contact tracing cases of syphilis.
Many parts of the country have a health officer and a health director/administrator by statute. The analysis counted both of those positions if they existed. For state-level departments, the count tracks people in the top and second-highest-ranking job.
The analysis includes exits of top department officials regardless of reason, because no matter the reason, each left a vacancy at the top of a health agency during the pandemic. Reasons for departures include political pressure, health concerns and poor performance. Others left to take higher profile positions or to retire. Some departments had multiple top officials exit over the course of the pandemic; each is included in the analysis.
Reporters compiled the exit list by reaching out to public health associations and experts in every state and interviewing hundreds of public health employees. They also received information from the National Association of City and County Health Officials, and combed news reports and records.
Public health departments can be found at multiple levels of government. Each state has a department that handles these tasks, but most states also have local departments that either operate under local or state control. The population served by each local health department is calculated using the U.S. Census Bureau 2019 Population Estimates based on each department’s jurisdiction.
KHN and the AP have worked since the spring on a series of stories documenting the funding, staffing and problems around public health. A previous data distribution detailed a decade's worth of cuts to state and local spending and staffing on public health. That data can be found here.
Findings and the data should be cited as: "According to a KHN and Associated Press report."
If you know of a public health official in your state or area who has left that position between April 1, 2020 and Sept. 12, 2021 and isn't currently in our dataset, please contact authors Anna Maria Barry-Jester annab@kff.org, Hannah Recht hrecht@kff.org, Michelle Smith mrsmith@ap.org and Lauren Weber laurenw@kff.org.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
***Starting on March 7th, 2024, the Los Angeles Police Department (LAPD) will adopt a new Records Management System for reporting crimes and arrests. This new system is being implemented to comply with the FBI's mandate to collect NIBRS-only data (NIBRS — FBI - https://www.fbi.gov/how-we-can-help-you/more-fbi-services-and-information/ucr/nibrs). During this transition, users will temporarily see only incidents reported in the retiring system. However, the LAPD is actively working on generating new NIBRS datasets to ensure a smoother and more efficient reporting system. ***
******Update 1/18/2024 - LAPD is facing issues with posting the Crime data, but we are taking immediate action to resolve the problem. We understand the importance of providing reliable and up-to-date information and are committed to delivering it.
As we work through the issues, we have temporarily reduced our updates from weekly to bi-weekly to ensure that we provide accurate information. Our team is actively working to identify and resolve these issues promptly.
We apologize for any inconvenience this may cause and appreciate your understanding. Rest assured, we are doing everything we can to fix the problem and get back to providing weekly updates as soon as possible. ******
This dataset reflects incidents of crime in the City of Los Angeles dating back to 2020. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. This data is as accurate as the data in the database. Please note questions or concerns in the comments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A previous study carried out by this research group involved the development of several classification techniques for the prediction of central neuropathic pain (CNP) in people with subacute spinal cord injury (SCI) based on electroencephalograph (EEG) data. CNP typically develops within 1 year post-injury in around 50% of patients (Finnerup, 2013), allowing us to identify EEG features that differ between those with and without CNP. Following data collection from a cohort of SCI participants, we have trained and tested classifiers on labelled EEG data using discriminatory features from both participants who did and did not develop CNP (Vuckovic et al., 2018). Now, as part of a project that aims to further optimize and validate a prediction method for CNP using a larger dataset, we are recruiting subacute SCI participants to take part in the same experimental protocol as originally used to develop the classifiers. As a result, we would like to test the accuracy of the original classifiers by passing our newly acquired data through each classifier to obtain predictions as to whether the participants are going to develop pain or not. Part of this clinical study involves following up with participants approximately 6 months after the first session, at which time we will learn whether each participant has since developed CNP and therefore whether our prediction using these preliminary classifiers has been successful.
It is common practice in machine learning to estimate the generalization performance of a predictive model through leave-one-out cross-validation on the training dataset. This approach has substantial disadvantages in that estimates of generalization performance are optimistically biased, due to strong spurious correlations in unobserved variables across training datapoints (e.g., same experimenter, same hardware devices, non-independent samples from the population) and the designs of classification pipelines are overfit to specific datasets by using repeated cross-validation on the same data to choose hyper-parameters. The purpose of pre-registration is to be able to prove that a record of the specific classifiers, their hyper-parameters, training procedure, and predictions has been created before the events that are to be predicted are knowable, i.e. before the predicted outcome is determined in a follow-up with participants. With this approach we aim to strengthen the empirical analysis of predictive models by avoiding the possibility that any knowledge about outcomes could influence the design of the evaluation protocol, assuming a causal arrow of time.
The Motor Vehicle Collisions person table contains details for people involved in the crash. Each row represents a person (driver, occupant, pedestrian, bicyclist,..) involved in a crash. The data in this table goes back to April 2016 when crash reporting switched to an electronic system. The Motor Vehicle Collisions data tables contain information from all police reported motor vehicle collisions in NYC. The police report (MV104-AN) is required to be filled out for collisions where someone is injured or killed, or where there is at least $1000 worth of damage (https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/ny_overlay_mv-104an_rev05_2004.pdf). It should be noted that the data is preliminary and subject to change when the MV-104AN forms are amended based on revised crash details. Due to success of the CompStat program, NYPD began to ask how to apply the CompStat principles to other problems. Other than homicides, the fatal incidents with which police have the most contact with the public are fatal traffic collisions. Therefore in April 1998, the Department implemented TrafficStat, which uses the CompStat model to work towards improving traffic safety. Police officers complete form MV-104AN for all vehicle collisions. The MV-104AN is a New York State form that has all of the details of a traffic collision. Before implementing Trafficstat, there was no uniform traffic safety data collection procedure for all of the NYPD precincts. Therefore, the Police Department implemented the Traffic Accident Management System (TAMS) in July 1999 in order to collect traffic data in a uniform method across the City. TAMS required the precincts manually enter a few selected MV-104AN fields to collect very basic intersection traffic crash statistics which included the number of accidents, injuries and fatalities. As the years progressed, there grew a need for additional traffic data so that more detailed analyses could be conducted. The Citywide traffic safety initiative, Vision Zero started in the year 2014. Vision Zero further emphasized the need for the collection of more traffic data in order to work towards the Vision Zero goal, which is to eliminate traffic fatalities. Therefore, the Department in March 2016 replaced the TAMS with the new Finest Online Records Management System (FORMS). FORMS enables the police officers to electronically, using a Department cellphone or computer, enter all of the MV-104AN data fields and stores all of the MV-104AN data fields in the Department’s crime data warehouse. Since all of the MV-104AN data fields are now stored for each traffic collision, detailed traffic safety analyses can be conducted as applicable.
https://www.icpsr.umich.edu/web/ICPSR/studies/23263/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/23263/terms
The National Health Measurement Study (NHMS) surveyed older United States adults with a suite of health-related quality of life (HRQoL) indices to allow comparison and cross-calibration of these instruments. The design oversampled African Americans and older individuals to allow subgroup analyses. Several preference-weighted indices measuring self-reported generic HRQoL are used widely in population surveys and clinical studies in the United States and around the world. These indices are used to evaluate individual and population health. Because they have been developed using econometric methods to elicit utility weights for their scoring systems, they are generally accepted for use in cost-effectiveness analyses of health interventions. Each index uses a multidimensional representation of health, but each index covers the dimensions of health (e.g., physical function, mental function, social function, pain, other symptoms, etc.) differently, and uses questionnaires with different psychometric properties. Each index is scored so that perfect health is represented as 1.0 and dead is represented as 0.0, but they are known to have different scaling properties. Rarely have two or more of these instruments been included in a population survey, so there have been few opportunities to directly compare how they describe and measure health using multi-instrument data. In this study, respondents indicated whether they had been diagnosed with coronary heart disease, stroke, diabetes, arthritis, eye disease, sleep disorder, chronic respiratory disease, clinical depression or anxiety disorder, gastrointestinal ulcer, thyroid disorder, and/or severe chronic back pain. Census tract is not identified, however race composition, education levels, economic factors, and urbanicity of each respondent's census tract of residence are included as contextual variables. Demographic, socioeconomic, and additional health data were elicited. Respondents are characterized by census region of residence, age, gender, marital status, race, ethnicity, education, household income and assets, health insurance, weight, height, smoking status, psychological well-being scales, and everyday and lifetime discrimination items. The data were de-identified, and extensive documentation was developed. The NHMS collected data on 3,844 adults in the continental United States (1,641 males and 2,203 females, 1,086 African Americans).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Neuromyelitis optica spectrum disorder (NMOSD) is an inflammatory disorder of the central nervous system with common symptoms of rapid onset of eye pain, loss of vision, neck/back pain, paralysis, bowel and bladder dysfunction and heat sensitivity. The rare, unpredictable, and debilitating nature of NMOSD constitutes a unique psychological burden for patients and their caregivers, the specific nature and extent of which is not yet known. This mixed methods study, informed by both quantitative and qualitative data collected via self-report measures, focus groups, and in-depth interviews, aims to investigate and understand the psychological burden of patients with NMOSD and their caregiver/loved ones, so as to inform a specialized intervention. 31 adults living with NMOSD and 22 caregivers of people with NMOSD in the United States and Canada, recruited from NMOSD patient advocacy groups, social media groups, and through word of mouth from other participants, completed a battery of standardized self-report measures of anxiety, depression, trauma, cognitive fusion, valued living, and coping styles. Semi-structured focus group sessions were conducted via HIPAA-compliant Zoom with 31 patients, and separate focus groups were conducted with 22 caregivers. A subset of these samples, comprised of 16 patients and 11 caregivers, participated in individual semi-structured interviews, prioritizing inclusion of diverse perspectives. Descriptive statistics and bivariate correlations were run on quantitative self-report data using SPSS [Version 28.0.1]; data were stored in REDCap. Reflexive thematic analysis was employed regarding qualitative individual interview data. The majority of patients reported experiencing anxiety, depression, cognitive fusion, over-controlled coping, and lack of values-based living. Caregivers also reported heightened anxiety, cognitive fusion, and over-controlled coping, although they did not endorse clinically significant depression. Patient and caregiver degree of anxiety and of overcontrolled coping were both strongly positively correlated, likely affecting how both parties manage NMOSD-related stressors, both individually and as a dyad. Patients reported more anxiety, depression, psychological inflexibility, and lack of values-based living, compared with caregivers. Patient and caregiver narrative themes included mistrust of medical professionals, lack of support immediately following diagnosis, changes in relationships, deviation from values-based living, internalization of feelings, and avoidant coping strategies to manage the psychological burden of NMOSD. A novel mental health intervention targeting the specific psychological burden of life with NMOSD is proposed.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
In 2023, the number of data compromises in the United States stood at 3,205 cases. Meanwhile, over 353 million individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2022, healthcare, financial services, and manufacturing were the three industry sectors that recorded most data breaches. The number of healthcare data breaches in the United States has gradually increased within the past few years. In the financial sector, data compromises increased almost twice between 2020 and 2022, while manufacturing saw an increase of more than three times in data compromise incidents. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.