10 datasets found
  1. Detecting Healthcare Fraud using Benford's Law

    • kaggle.com
    zip
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darren Chess (2023). Detecting Healthcare Fraud using Benford's Law [Dataset]. https://www.kaggle.com/datasets/darrenchess/detecting-healthcare-fraud-using-benfords-law/discussion
    Explore at:
    zip(2015411 bytes)Available download formats
    Dataset updated
    Apr 20, 2023
    Authors
    Darren Chess
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Investigating Healthcare Fraud using Benford’s Law

    Introduction: The outbreak of COVID-19 in 2020 resulted in a surge in demand for COVID-19 testing, leading to the emergence of many new healthcare testing companies. However, not all of these companies may have been operating legitimately, and some may have engaged in fraudulent activities to inflate their revenue and profits. In this investigation, we will examine the data for three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Labs Inc. - to determine if there is evidence of potential fraud or other wrongdoing. By analyzing the data using techniques such as Benford’s Law and correlation analysis, we hope to provide insights into the potential misconduct and recommend measures to prevent such activities in the future. You are a data analyst who has been tasked with investigating three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Testing Inc. These companies all went out of business in or before 2022 under suspicious circumstances. Your job is to analyze the data available for these companies and determine if there is evidence of potential fraud or other wrongdoing. Data: The data provided for the three companies is broken down by date, total number of patients that day, and the summary of what was billed to the insurance company and/or Medicaid with insurance. The billed number is a summary of the charges claimed. The data was compiled from individual records, but the originals are unavailable. Some of the data may contain missing values, duplicates, or things like ‘na’ or ‘N/A’, so data cleaning is necessary. Benford’s Law: Before analyzing the data, it is important to understand Benford’s Law and how it can be used to detect potential fraud. Benford’s Law states that in many naturally occurring datasets, the first digit is more likely to be a small number (such as 1) than a large number (such as 9). This law can be used to identify datasets that do not conform to this pattern, which may indicate that the data has been manipulated or fabricated. Benford’s Law P(d)=log10(1+1/d) where: P(d) is the probability that the first digit of a number is d d is a single-digit number from 1 to 9. https://en.wikipedia.org/wiki/Benford%27s_law

  2. Z

    Data from: Benford's law conformity tests for all crypto currencies in the...

    • data.niaid.nih.gov
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jernej Vičič; Aleksandar Tošić (2021). Benford's law conformity tests for all crypto currencies in the time frame 2009 (START) - 2018 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4682975
    Explore at:
    Dataset updated
    Apr 16, 2021
    Dataset provided by
    University of Primorska, Andrej Marušič Institute, Muzejski trg 2, SI-6000 Koper, Slovenia
    Authors
    Jernej Vičič; Aleksandar Tošić
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Benford's law, also known as the first-digit law, has been widely used to test for anomalies in various data ranging from accounting fraud detection, stock prices, house prices to electricity bills, population numbers, natural phenomena, death rates and recently so popular COVID-19 cases reports.

    DataHub cryptocurrency datasets (https://datahub.io/cryptocurrency) was used as a source for daily aggregated data about all transactions on all crypto coin networks from the first mined block on the Bitcoin network till the end of the 2018.

    The presented dataset is a collection of Benford's law conformity tests for all cryptocurrencies in the observed time-frame.

  3. f

    P-values for tests of the null hypothesis of Benford’s Law for genomic data....

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Lesperance; W. J. Reed; M. A. Stephens; C. Tsao; B. Wilton (2023). P-values for tests of the null hypothesis of Benford’s Law for genomic data. [Dataset]. http://doi.org/10.1371/journal.pone.0151235.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    M. Lesperance; W. J. Reed; M. A. Stephens; C. Tsao; B. Wilton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    P-values for tests of the null hypothesis of Benford’s Law for genomic data.

  4. m

    The Full Database of Countries with Potential COVID-19 Data Manipulation...

    • data.mendeley.com
    Updated Nov 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad Kilani (2020). The Full Database of Countries with Potential COVID-19 Data Manipulation based on Benford’s Law [Dataset]. http://doi.org/10.17632/vccb4y2npf.1
    Explore at:
    Dataset updated
    Nov 23, 2020
    Authors
    Ahmad Kilani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The aim of this database is to provide researchers and scholars a unified database for potential data manipulation by 171 countries regarding their Covid-19 daily reported cases. The database show three different tests that are used to determine if the data given by each country in the world fit Benford’s Law.

  5. Pearson Correlation and p-values from Kolmogorov–Smirnov Tests between the...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Golbeck (2023). Pearson Correlation and p-values from Kolmogorov–Smirnov Tests between the distribution of first significant digits of friend/follower distributions of various social networks and values predicted by Benford’s Law. [Dataset]. http://doi.org/10.1371/journal.pone.0135169.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jennifer Golbeck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pearson Correlation and p-values from Kolmogorov–Smirnov Tests between the distribution of first significant digits of friend/follower distributions of various social networks and values predicted by Benford’s Law.

  6. Aggregated researcher publications data for network for Slovenian network...

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo, Aggregated researcher publications data for network for Slovenian network and Benford law conformity [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3935770?locale=sk
    Explore at:
    unknown(37311516)Available download formats
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Slovinsko
    Description

    The dataset includes: raw aggregated Slovenian researcher network data (available at http:((www.sicris.si), Benford law distribution conformity tests.

  7. d

    Data from: Detecting Anomalies in Data on Government Violence

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanisha D. Bond; Courtenay R. Conrad; Dylan Moses; Joel W. Simmons (2023). Detecting Anomalies in Data on Government Violence [Dataset]. http://doi.org/10.7910/DVN/IDD1GQ
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Kanisha D. Bond; Courtenay R. Conrad; Dylan Moses; Joel W. Simmons
    Description

    Can data on government coercion and violence be trusted when the data are generated by state itself? In this paper, we investigate the extent to which data from the California Department of Corrections and Rehabilitation (CDCR) regarding the use of force by corrections officers against prison inmates between 2008 and 2017 conform to Benford’s Law. Following a growing data forensics literature, we expect misreporting of the use-of-force in California state prisons to cause the observed data to deviate from Benford’s distribution. Statistical hypothesis tests and further investigation of CDCR data—which show both temporal and cross-sectional variance in conformity with Benford’s Law—are consistent with misreporting of the use-of-force by the CDCR. Our results suggest that data on government coercion generated by the state should be inspected carefully before being used to test hypotheses or make policy.

  8. Összevont kutatói publikációs adatok a szlovén hálózathoz és a benfordi...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Összevont kutatói publikációs adatok a szlovén hálózathoz és a benfordi törvényeknek való megfeleléshez [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3935770?locale=hu
    Explore at:
    unknown(37311516)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Az adatkészlet a következőket tartalmazza: nyers összesített szlovén kutatói hálózati adatok (elérhető a következő internetcímen: http:(www.sicris.si), Benford law distribution compliance tests.

  9. Aggregerade uppgifter om forskarpublikationer för nätverk för det slovenska...

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo, Aggregerade uppgifter om forskarpublikationer för nätverk för det slovenska nätverket och överensstämmelse med Benfords lagstiftning [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3935770?locale=sv
    Explore at:
    unknown(37311516)Available download formats
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasetet omfattar följande: råa aggregerade uppgifter från det slovenska forskarnätverket (tillgängliga på http:(www.sicris.si), Benford law distribution conformity tests.

  10. Russian Presidental Elections 2018 Voting Data

    • kaggle.com
    zip
    Updated Apr 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuriy Gavrilin (2018). Russian Presidental Elections 2018 Voting Data [Dataset]. https://www.kaggle.com/valenzione/russian-presidental-elections-2018-voting-data
    Explore at:
    zip(6862837 bytes)Available download formats
    Dataset updated
    Apr 30, 2018
    Authors
    Yuriy Gavrilin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Russia
    Description

    Context

    Russian elections are considered to be rigged. To test this hypothesis we scraped data about last presidental elections.

    Content

    This dataset contains data about 2018 presidental elections in Russian Federation. Data was scraped from regional election commitees websites. This is the root website from which you can access any regional EC. Each row represents polling station and have various information about ballot papers and, of course, number of votes given for each candidate. Polling station id is not unique across all dataset, but it is unique across every region. Dataset is given in two versions - English, with region names transliterated and columns translated, and Russian.

    Acknowledgements

    We were inspired by following papers:

    • A Guide to Election Forensics by Hicken, Allen and Mebane Jr, Walter R
    • Testing for voter rigging in small polling stations by Jimenez, Ra'ul and Hidalgo, Manuel and Klimek, Peter
    • Election forensics: Vote counts and Benford’s law by Mebane Jr, Walter R
    • When Does the Second-Digit Benford’s Law-Test Signal an Election Fraud? by Shikano, Susumu and Mack, Verena
    • Statistical detection of systematic election irregularities by Klimek, Peter and Yegorov, Yuri and Hanel, Rudolf and Thurner, Stefan

    Inspiration

    Is it true, that this elections were rigged? How polling station proximity influences the voting process? Can outliers can be detected using this proximity?

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Darren Chess (2023). Detecting Healthcare Fraud using Benford's Law [Dataset]. https://www.kaggle.com/datasets/darrenchess/detecting-healthcare-fraud-using-benfords-law/discussion
Organization logo

Detecting Healthcare Fraud using Benford's Law

Detect fraud in Covid-19 testing companies using Benford's Law

Explore at:
zip(2015411 bytes)Available download formats
Dataset updated
Apr 20, 2023
Authors
Darren Chess
License

https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

Description

Investigating Healthcare Fraud using Benford’s Law

Introduction: The outbreak of COVID-19 in 2020 resulted in a surge in demand for COVID-19 testing, leading to the emergence of many new healthcare testing companies. However, not all of these companies may have been operating legitimately, and some may have engaged in fraudulent activities to inflate their revenue and profits. In this investigation, we will examine the data for three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Labs Inc. - to determine if there is evidence of potential fraud or other wrongdoing. By analyzing the data using techniques such as Benford’s Law and correlation analysis, we hope to provide insights into the potential misconduct and recommend measures to prevent such activities in the future. You are a data analyst who has been tasked with investigating three healthcare testing companies - Advance Genomic Diagnostics, Exelonixx Labs Inc., and California Molecular Testing Inc. These companies all went out of business in or before 2022 under suspicious circumstances. Your job is to analyze the data available for these companies and determine if there is evidence of potential fraud or other wrongdoing. Data: The data provided for the three companies is broken down by date, total number of patients that day, and the summary of what was billed to the insurance company and/or Medicaid with insurance. The billed number is a summary of the charges claimed. The data was compiled from individual records, but the originals are unavailable. Some of the data may contain missing values, duplicates, or things like ‘na’ or ‘N/A’, so data cleaning is necessary. Benford’s Law: Before analyzing the data, it is important to understand Benford’s Law and how it can be used to detect potential fraud. Benford’s Law states that in many naturally occurring datasets, the first digit is more likely to be a small number (such as 1) than a large number (such as 9). This law can be used to identify datasets that do not conform to this pattern, which may indicate that the data has been manipulated or fabricated. Benford’s Law P(d)=log10(1+1/d) where: P(d) is the probability that the first digit of a number is d d is a single-digit number from 1 to 9. https://en.wikipedia.org/wiki/Benford%27s_law

Search
Clear search
Close search
Google apps
Main menu