Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Investigating Healthcare Fraud using Benford’s Law
Introduction: The outbreak of COVID-19 in 2020 resulted in a surge in demand for COVID-19 testing, leading to the emergence of many new healthcare testing companies. However, not all of these companies may have been operating legitimately, and some may have engaged in fraudulent activities to inflate their revenue and profits. In this investigation, we will examine the data for three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Labs Inc. - to determine if there is evidence of potential fraud or other wrongdoing. By analyzing the data using techniques such as Benford’s Law and correlation analysis, we hope to provide insights into the potential misconduct and recommend measures to prevent such activities in the future. You are a data analyst who has been tasked with investigating three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Testing Inc. These companies all went out of business in or before 2022 under suspicious circumstances. Your job is to analyze the data available for these companies and determine if there is evidence of potential fraud or other wrongdoing. Data: The data provided for the three companies is broken down by date, total number of patients that day, and the summary of what was billed to the insurance company and/or Medicaid with insurance. The billed number is a summary of the charges claimed. The data was compiled from individual records, but the originals are unavailable. Some of the data may contain missing values, duplicates, or things like ‘na’ or ‘N/A’, so data cleaning is necessary. Benford’s Law: Before analyzing the data, it is important to understand Benford’s Law and how it can be used to detect potential fraud. Benford’s Law states that in many naturally occurring datasets, the first digit is more likely to be a small number (such as 1) than a large number (such as 9). This law can be used to identify datasets that do not conform to this pattern, which may indicate that the data has been manipulated or fabricated. Benford’s Law P(d)=log10(1+1/d) where: P(d) is the probability that the first digit of a number is d d is a single-digit number from 1 to 9. https://en.wikipedia.org/wiki/Benford%27s_law
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benford's law, also known as the first-digit law, has been widely used to test for anomalies in various data ranging from accounting fraud detection, stock prices, house prices to electricity bills, population numbers, natural phenomena, death rates and recently so popular COVID-19 cases reports.
DataHub cryptocurrency datasets (https://datahub.io/cryptocurrency) was used as a source for daily aggregated data about all transactions on all crypto coin networks from the first mined block on the Bitcoin network till the end of the 2018.
The presented dataset is a collection of Benford's law conformity tests for all cryptocurrencies in the observed time-frame.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
P-values for tests of the null hypothesis of Benford’s Law for genomic data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The aim of this database is to provide researchers and scholars a unified database for potential data manipulation by 171 countries regarding their Covid-19 daily reported cases. The database show three different tests that are used to determine if the data given by each country in the world fit Benford’s Law.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pearson Correlation and p-values from Kolmogorov–Smirnov Tests between the distribution of first significant digits of friend/follower distributions of various social networks and values predicted by Benford’s Law.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset includes: raw aggregated Slovenian researcher network data (available at http:((www.sicris.si), Benford law distribution conformity tests.
Facebook
TwitterCan data on government coercion and violence be trusted when the data are generated by state itself? In this paper, we investigate the extent to which data from the California Department of Corrections and Rehabilitation (CDCR) regarding the use of force by corrections officers against prison inmates between 2008 and 2017 conform to Benford’s Law. Following a growing data forensics literature, we expect misreporting of the use-of-force in California state prisons to cause the observed data to deviate from Benford’s distribution. Statistical hypothesis tests and further investigation of CDCR data—which show both temporal and cross-sectional variance in conformity with Benford’s Law—are consistent with misreporting of the use-of-force by the CDCR. Our results suggest that data on government coercion generated by the state should be inspected carefully before being used to test hypotheses or make policy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Az adatkészlet a következőket tartalmazza: nyers összesített szlovén kutatói hálózati adatok (elérhető a következő internetcímen: http:(www.sicris.si), Benford law distribution compliance tests.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasetet omfattar följande: råa aggregerade uppgifter från det slovenska forskarnätverket (tillgängliga på http:(www.sicris.si), Benford law distribution conformity tests.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Russian elections are considered to be rigged. To test this hypothesis we scraped data about last presidental elections.
This dataset contains data about 2018 presidental elections in Russian Federation. Data was scraped from regional election commitees websites. This is the root website from which you can access any regional EC. Each row represents polling station and have various information about ballot papers and, of course, number of votes given for each candidate. Polling station id is not unique across all dataset, but it is unique across every region. Dataset is given in two versions - English, with region names transliterated and columns translated, and Russian.
We were inspired by following papers:
Is it true, that this elections were rigged? How polling station proximity influences the voting process? Can outliers can be detected using this proximity?
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Investigating Healthcare Fraud using Benford’s Law
Introduction: The outbreak of COVID-19 in 2020 resulted in a surge in demand for COVID-19 testing, leading to the emergence of many new healthcare testing companies. However, not all of these companies may have been operating legitimately, and some may have engaged in fraudulent activities to inflate their revenue and profits. In this investigation, we will examine the data for three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Labs Inc. - to determine if there is evidence of potential fraud or other wrongdoing. By analyzing the data using techniques such as Benford’s Law and correlation analysis, we hope to provide insights into the potential misconduct and recommend measures to prevent such activities in the future. You are a data analyst who has been tasked with investigating three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Testing Inc. These companies all went out of business in or before 2022 under suspicious circumstances. Your job is to analyze the data available for these companies and determine if there is evidence of potential fraud or other wrongdoing. Data: The data provided for the three companies is broken down by date, total number of patients that day, and the summary of what was billed to the insurance company and/or Medicaid with insurance. The billed number is a summary of the charges claimed. The data was compiled from individual records, but the originals are unavailable. Some of the data may contain missing values, duplicates, or things like ‘na’ or ‘N/A’, so data cleaning is necessary. Benford’s Law: Before analyzing the data, it is important to understand Benford’s Law and how it can be used to detect potential fraud. Benford’s Law states that in many naturally occurring datasets, the first digit is more likely to be a small number (such as 1) than a large number (such as 9). This law can be used to identify datasets that do not conform to this pattern, which may indicate that the data has been manipulated or fabricated. Benford’s Law P(d)=log10(1+1/d) where: P(d) is the probability that the first digit of a number is d d is a single-digit number from 1 to 9. https://en.wikipedia.org/wiki/Benford%27s_law