30 datasets found
  1. d

    Prolific observer bias in the life sciences: why we need blind data...

    • search.dataone.org
    • researchdata.edu.au
    • +3more
    Updated Apr 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions (2025). Prolific observer bias in the life sciences: why we need blind data recording [Dataset]. http://doi.org/10.5061/dryad.hn40n
    Explore at:
    Dataset updated
    Apr 12, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions
    Time period covered
    Jan 1, 2015
    Description

    Observer bias and other “experimenter effects†occur when researchers’ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work “blind,†meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.

  2. m

    Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks

    • data.mendeley.com
    Updated Apr 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyunggu Jung (2024). Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks [Dataset]. http://doi.org/10.17632/rnyrpzyw3h.2
    Explore at:
    Dataset updated
    Apr 2, 2024
    Authors
    Hyunggu Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of reviews collected from restaurants on a Korean delivery app platform running a review event. A total of 128,668 reviews were collected from 136 restaurants by crawling reviews using the Selenium library in Python. The dataset named as Korean Reviews.csv provides review data not translated to English, and the dataset named as English Reviews.csv provides review data translated to English. The 136 chosen restaurants run review events which demand customers to write reviews with 5 stars and photos. So the annotation of data was done by considering 1) whether the review gives five-star ratings, and 2) whether the review contains photo(s).

  3. Performance parameters.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance parameters.

  4. A Multidimensional Dataset for Analyzing and Detecting News Bias based on...

    • zenodo.org
    zip
    Updated Jun 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Färber; Michael Färber; Victoria Burkard; Adam Jatowt; Sora Lim; Victoria Burkard; Adam Jatowt; Sora Lim (2020). A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing [Dataset]. http://doi.org/10.5281/zenodo.3885351
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 8, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Färber; Michael Färber; Victoria Burkard; Adam Jatowt; Sora Lim; Victoria Burkard; Adam Jatowt; Sora Lim
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We provide a large data set consisting of 2,057 sentences from 90 news articles and annotations of crowdworkers with respect to bias itself and the following bias dimensions:

    1. hidden assumptions
    2. subjectivity
    3. representation tendencies

    Our data set contains 44,547 labels in total (43,197 sentence labels and 1,350 article labels).

    The news articles deal with the Ukraine crisis. They were published in 33 countries in total and were selected based on the data set of Cremisini et al. (Cremisini, A., Aguilar, D., & Finlayson, M. A. A Challenging Dataset for Bias Detection: The Case of the Crisis in the Ukraine, Proc. of SBP-BRiMS'19, pp. 173-183, 2019).

    Each sentence was annotated by 5 crowdworkers. In total, we spent $ 3,335 for the crowdworkers annotations.

    More information can be found in our GitHub repository. A description of the used file format is given in the codebook attached to the dataset.

    Please cite our data set as follows:

    @unpublished{Faerber2020Bias,
     author = {Michael F{\"{a}}rber and Victoria Burkard and Adam Jatowt and Sora Lim},
     title = {{A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing}},
     year  = {2020}
    }
  5. f

    Previous works comparative table.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Previous works comparative table. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Previous works comparative table.

  6. d

    Replication Data for: Exploring Gender Bias in Homicide Sentencing: An...

    • search.dataone.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhuchkova, Svetlana; Kazun, Anton (2023). Replication Data for: Exploring Gender Bias in Homicide Sentencing: An Empirical Study of Russian Court Decisions Using Text Mining [Dataset]. http://doi.org/10.7910/DVN/0YEPAA
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Zhuchkova, Svetlana; Kazun, Anton
    Description

    Data and replication code for the paper "Exploring Gender Bias in Homicide Sentencing: An Empirical Study of Russian Court Decisions Using Text Mining"

  7. f

    Adjective tags.

    • plos.figshare.com
    xls
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Adjective tags. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adjective tags.

  8. (Un)Fair Process Mining Event Logs

    • zenodo.org
    application/gzip
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berti Alessandro Pohl Timo; Berti Alessandro Pohl Timo (2023). (Un)Fair Process Mining Event Logs [Dataset]. http://doi.org/10.5281/zenodo.8059489
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Berti Alessandro Pohl Timo; Berti Alessandro Pohl Timo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    License: CC-BY-4.0

    Event Logs:

    We introduce a set of 12 distinct event logs, three for each of the four domains: hiring, healthcare, lending, and renting. These event logs have been carefully curated and simulated, each containing 10,000 cases, thereby providing an extensive resource for researchers focusing on fairness in process mining.

    In each of these domains, the three event logs represent varying degrees of discrimination, offering researchers an opportunity to explore the nuances and complexities that arise in diverse real-world scenarios. By presenting each log with a thorough description of the inherent processes and their respective attributes, we aim to provide a robust groundwork for understanding the potential sources of discrimination and addressing fairness in process mining.

    We have ensured that all the event logs are provided in the eXtensible Event Stream (XES) standard format. This adherence to a recognized standard not only ensures broad compatibility but also facilitates interoperability across a variety of process mining tools. By choosing this common format, we aim to encourage and simplify the utilization of these logs for researchers across different platforms.

    * Hiring

    The data describes a multifaceted recruitment process with diverse application pathways ranging from minimal processing to extensive multi-step procedures. The variability of these routes, largely dependent on numerous determinants, yields a spectrum of outcomes from instant rejection to successful job offers.

    The logs include attributes such as age, citizenship, German proficiency, gender, religion, and years of education. While these attributes may inform candidate profiles, their misuse could engender discrimination. Variables like age and education may signify experience and skills, citizenship and German language may address job logistics, but these should not unjustly eliminate applicants. Gender and religion, unrelated to job performance, must not sway hiring. Therefore, the use of these attributes must uphold fairness, avoiding any potential bias.

    * Hospital

    The data depicts a hospital treatment process that commences with registration at an Emergency Room or Family Department and advances through stages of examination, diagnosis, and treatment. Notably, unsuccessful treatments often entail repetitive diagnostic and treatment cycles, underscoring the iterative nature of healthcare provision.

    The logs incorporate patient attributes such as age, underlying condition, citizenship, German language proficiency, gender, and private insurance. These attributes, influencing the treatment process, may unveil potential discrimination. Factors like age and condition might affect case complexity and treatment path, while citizenship may highlight healthcare access disparities. German proficiency can impact provider-patient communication, thus affecting care quality. Gender could spotlight potential health disparities, while insurance status might indicate socio-economic influences on care quality or timeliness. Therefore, a comprehensive examination of these attributes vis-a-vis the treatment process could shed light on potential biases or disparities, fostering fairness in healthcare delivery.

    * Lending

    This data illustrates the steps within a loan application process. From an initial appointment request, the process navigates various stages, including information verification and underwriting, culminating in loan approval or denial. Additional steps may be required, such as co-signer enlistment or collateral assessment. Some cases experience outright appointment denial, indicating the process's variability, reflecting applicants' differing credit situations.

    The logs' attributes can aid in identifying influences on outcomes and detecting discrimination. Personal characteristics ('age', 'citizen', 'German speaking', and 'gender') and socio-economic indicators ('YearsOfEducation' and 'CreditScore') can impact the process. While 'yearsOfEducation' and 'CreditScore' can validly inform creditworthiness, 'age', 'citizen', 'language ability', and 'gender' should not bias loan decisions, ensuring these attributes are used responsibly fosters equitable loan processes.

    * Renting

    The data represents a rental process. It begins with a prospective tenant applying to view a property. Subsequent steps include an initial screening phase, viewing, decision-making, and a potential extensive screening. The process ends with the acceptance or rejection of the prospective tenant. In some cases, a tenant may apply for viewing but be rejected without the viewing occurring.

    The logs contain attributes that can shed light on potential biases in the process. 'Age', 'citizen', 'German speaking', 'gender', 'religious affiliation', and 'yearsOfEducation' might influence the rental process, leading to potential discrimination. While some attributes may provide useful insights into a potential tenant's reliability, misuse could result in discrimination. Thus, fairness must be observed in utilizing these attributes to avoid potential biases and ensure equitable treatment.

  9. f

    Categorization of doctoral theses.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Categorization of doctoral theses. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorization of doctoral theses.

  10. A

    Artificial Intelligence (AI) Verticals Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Artificial Intelligence (AI) Verticals Report [Dataset]. https://www.archivemarketresearch.com/reports/artificial-intelligence-ai-verticals-48602
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Artificial Intelligence (AI) market is experiencing explosive growth, driven by advancements in machine learning, data mining, and automatic driving technologies. While precise market size figures for 2025 aren't provided, considering the rapid expansion of AI across various sectors, a reasonable estimate for the total market size in 2025 is $500 billion, based on reports indicating substantial growth in recent years and projections for future expansion. Assuming a conservative Compound Annual Growth Rate (CAGR) of 25% for the forecast period (2025-2033), the market is projected to reach approximately $3.7 trillion by 2033. This significant expansion is fueled by several key factors. Firstly, the increasing availability and affordability of computing power allow for more complex AI models and applications. Secondly, the burgeoning volume of data generated across various industries provides rich fuel for AI algorithms. Thirdly, businesses across sectors, including healthcare, automotive, and manufacturing, are increasingly adopting AI to improve efficiency, optimize processes, and gain a competitive edge. The segments of Automatic Driving, Machine Learning and Data Mining are expected to be the key drivers of this growth, with applications in healthcare and automotive leading the charge. However, challenges remain. The high cost of AI development and implementation can pose a barrier to entry for smaller businesses. Concerns surrounding data privacy, algorithmic bias, and job displacement due to automation also represent potential restraints on market growth. Nevertheless, the overall trajectory indicates a sustained period of expansion, shaped by continuous innovation and widening adoption across diverse industries and geographical regions. Companies such as Uber, Airbnb, Salesforce, and others are at the forefront of this technological revolution, leveraging AI to enhance their services and operations. The regional breakdown shows a significant market presence across North America, Europe, and Asia Pacific, with further expansion anticipated in emerging markets. The market's growth is expected to remain robust as AI continues to permeate various facets of our lives, transforming industries and creating new opportunities.

  11. Key indicators.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Key indicators. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Key indicators.

  12. r

    Data from: The extent and consequences of p-hacking in science

    • researchdata.edu.au
    • datadryad.org
    Updated Jun 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rob Lanfear; Michael D. Jennions; Megan L. Head; Luke Holman; Andrew T. Kahn (2022). Data from: The extent and consequences of p-hacking in science [Dataset]. http://doi.org/10.5061/DRYAD.79D43
    Explore at:
    Dataset updated
    Jun 11, 2022
    Dataset provided by
    Macquarie University
    Authors
    Rob Lanfear; Michael D. Jennions; Megan L. Head; Luke Holman; Andrew T. Kahn
    Description

    A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.

    Usage Notes

    Data from: The extent and consequences of p-hacking in scienceThis zip file consists of three parts. 1. Data obtained from text-mining and associated analysis files. 2. Data obtained from previously published meta-analyses and associated analysis files. 3. Analysis files used to conduct meta-analyses of the data. Read me files are contained within this zip file.FILES_FOR_DRYAD.zip

  13. Data Analytics Market Analysis, Size, and Forecast 2025-2029: North America...

    • technavio.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Data Analytics Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, and UK), Middle East and Africa (UAE), APAC (China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/data-analytics-market-industry-analysis
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global
    Description

    Snapshot img

    Data Analytics Market Size 2025-2029

    The data analytics market size is forecast to increase by USD 288.7 billion, at a CAGR of 14.7% between 2024 and 2029.

    The market is driven by the extensive use of modern technology in company operations, enabling businesses to extract valuable insights from their data. The prevalence of the Internet and the increased use of linked and integrated technologies have facilitated the collection and analysis of vast amounts of data from various sources. This trend is expected to continue as companies seek to gain a competitive edge by making data-driven decisions. However, the integration of data from different sources poses significant challenges. Ensuring data accuracy, consistency, and security is crucial as companies deal with large volumes of data from various internal and external sources. Additionally, the complexity of data analytics tools and the need for specialized skills can hinder adoption, particularly for smaller organizations with limited resources. Companies must address these challenges by investing in robust data management systems, implementing rigorous data validation processes, and providing training and development opportunities for their employees. By doing so, they can effectively harness the power of data analytics to drive growth and improve operational efficiency.

    What will be the Size of the Data Analytics Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleIn the dynamic and ever-evolving the market, entities such as explainable AI, time series analysis, data integration, data lakes, algorithm selection, feature engineering, marketing analytics, computer vision, data visualization, financial modeling, real-time analytics, data mining tools, and KPI dashboards continue to unfold and intertwine, shaping the industry's landscape. The application of these technologies spans various sectors, from risk management and fraud detection to conversion rate optimization and social media analytics. ETL processes, data warehousing, statistical software, data wrangling, and data storytelling are integral components of the data analytics ecosystem, enabling organizations to extract insights from their data. Cloud computing, deep learning, and data visualization tools further enhance the capabilities of data analytics platforms, allowing for advanced data-driven decision making and real-time analysis. Marketing analytics, clustering algorithms, and customer segmentation are essential for businesses seeking to optimize their marketing strategies and gain a competitive edge. Regression analysis, data visualization tools, and machine learning algorithms are instrumental in uncovering hidden patterns and trends, while predictive modeling and causal inference help organizations anticipate future outcomes and make informed decisions. Data governance, data quality, and bias detection are crucial aspects of the data analytics process, ensuring the accuracy, security, and ethical use of data. Supply chain analytics, healthcare analytics, and financial modeling are just a few examples of the diverse applications of data analytics, demonstrating the industry's far-reaching impact. Data pipelines, data mining, and model monitoring are essential for maintaining the continuous flow of data and ensuring the accuracy and reliability of analytics models. The integration of various data analytics tools and techniques continues to evolve, as the industry adapts to the ever-changing needs of businesses and consumers alike.

    How is this Data Analytics Industry segmented?

    The data analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. ComponentServicesSoftwareHardwareDeploymentCloudOn-premisesTypePrescriptive AnalyticsPredictive AnalyticsCustomer AnalyticsDescriptive AnalyticsOthersApplicationSupply Chain ManagementEnterprise Resource PlanningDatabase ManagementHuman Resource ManagementOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKMiddle East and AfricaUAEAPACChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

    By Component Insights

    The services segment is estimated to witness significant growth during the forecast period.The market is experiencing significant growth as businesses increasingly rely on advanced technologies to gain insights from their data. Natural language processing is a key component of this trend, enabling more sophisticated analysis of unstructured data. Fraud detection and data security solutions are also in high demand, as companies seek to protect against threats and maintain customer trust. Data analytics platforms, including cloud-based offeri

  14. f

    Noun tags.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Noun tags. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Noun tags.

  15. a

    Stanford CS229 - Machine Learning - Andrew Ng

    • academictorrents.com
    bittorrent
    Updated Apr 24, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Ng (2015). Stanford CS229 - Machine Learning - Andrew Ng [Dataset]. https://academictorrents.com/details/da90dedfb78190e5c62af1ad40a2413cb918457f
    Explore at:
    bittorrent(4211379788)Available download formats
    Dataset updated
    Apr 24, 2015
    Dataset authored and provided by
    Andrew Ng
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Course Description This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs; VC theory; large margins); reinforcement learning and adaptive control. The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing. # Prerequisites Students are expected to have the following background: Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program. Familiarity with the basic probability theory. (CS109 or Stat116 is sufficient but not necessary.) Familiarity with the basic l

  16. f

    Mapping vectors to words.

    • figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Mapping vectors to words. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mapping vectors to words.

  17. f

    Age of the author, and impact on non-inclusiveness.

    • plos.figshare.com
    xls
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Age of the author, and impact on non-inclusiveness. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Age of the author, and impact on non-inclusiveness.

  18. f

    Performance of the algorithm.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance of the algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of the algorithm.

  19. f

    Primary reporting of studies.

    • plos.figshare.com
    xlsx
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Emanuel Zurrer; Amelia Elaine Cannon; Ewoud Ewing; David Brüschweiler; Julia Bugajska; Bernard Friedrich Hild; Marianna Rosso; Daniel Salo Reich; Benjamin Victor Ineichen (2024). Primary reporting of studies. [Dataset]. http://doi.org/10.1371/journal.pone.0311358.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Wolfgang Emanuel Zurrer; Amelia Elaine Cannon; Ewoud Ewing; David Brüschweiler; Julia Bugajska; Bernard Friedrich Hild; Marianna Rosso; Daniel Salo Reich; Benjamin Victor Ineichen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background and methodsSystematic reviews, i.e., research summaries that address focused questions in a structured and reproducible manner, are a cornerstone of evidence-based medicine and research. However, certain steps in systematic reviews, such as data extraction, are labour-intensive, which hampers their feasibility, especially with the rapidly expanding body of biomedical literature. To bridge this gap, we aimed to develop a data mining tool in the R programming environment to automate data extraction from neuroscience in vivo publications. The function was trained on a literature corpus (n = 45 publications) of animal motor neuron disease studies and tested in two validation corpora (motor neuron diseases, n = 31 publications; multiple sclerosis, n = 244 publications).ResultsOur data mining tool, STEED (STructured Extraction of Experimental Data), successfully extracted key experimental parameters such as animal models and species, as well as risk of bias items like randomization or blinding, from in vivo studies. Sensitivity and specificity were over 85% and 80%, respectively, for most items in both validation corpora. Accuracy and F1-score were above 90% and 0.9 for most items in the validation corpora, respectively. Time savings were above 99%.ConclusionsOur text mining tool, STEED, can extract key experimental parameters and risk of bias items from the neuroscience in vivo literature. This enables the tool’s deployment for probing a field in a research improvement context or replacing one human reader during data extraction, resulting in substantial time savings and contributing towards the automation of systematic reviews.

  20. f

    Data from: Improving short-term grade block models: alternative for...

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristina da Paixão Araújo; João Felipe Coimbra Leite Costa; Vanessa Cerqueira Koppe (2023). Improving short-term grade block models: alternative for correcting soft data [Dataset]. http://doi.org/10.6084/m9.figshare.5772303.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELO journals
    Authors
    Cristina da Paixão Araújo; João Felipe Coimbra Leite Costa; Vanessa Cerqueira Koppe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Short-term mining planning typically relies on samples obtained from channels or less-accurate sampling methods. The results may include larger sampling errors than those derived from diamond drill hole core samples. The aim of this paper is to evaluate the impact of the sampling error on grade estimation and propose a method of correcting the imprecision and bias in the soft data. In addition, this paper evaluates the benefits of using soft data in mining planning. These concepts are illustrated via a gold mine case study, where two different data types are presented. The study used Au grades collected via diamond drilling (hard data) and channels (soft data). Four methodologies were considered for estimation of the Au grades of each block to be mined: ordinary kriging with hard and soft data pooled without considering differences in data quality; ordinary kriging with only hard data; standardized ordinary kriging with pooled hard and soft data; and standardized, ordinary cokriging. The results show that even biased samples collected using poor sampling protocols improve the estimates more than a limited number of precise and unbiased samples. A welldesigned estimation method corrects the biases embedded in the samples, mitigating their propagation to the block model.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions (2025). Prolific observer bias in the life sciences: why we need blind data recording [Dataset]. http://doi.org/10.5061/dryad.hn40n

Prolific observer bias in the life sciences: why we need blind data recording

Explore at:
Dataset updated
Apr 12, 2025
Dataset provided by
Dryad Digital Repository
Authors
Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions
Time period covered
Jan 1, 2015
Description

Observer bias and other “experimenter effects†occur when researchers’ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work “blind,†meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.

Search
Clear search
Close search
Google apps
Main menu