30 datasets found

d
Prolific observer bias in the life sciences: why we need blind data...
search.dataone.org
researchdata.edu.au
+3more
Updated Apr 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions (2025). Prolific observer bias in the life sciences: why we need blind data recording [Dataset]. http://doi.org/10.5061/dryad.hn40n
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.hn40n
Dataset updated
Apr 12, 2025
Dataset provided by
Dryad Digital Repository
Authors
Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions
Time period covered
Jan 1, 2015
Description
Observer bias and other â€œexperimenter effectsâ€ occur when researchersâ€™ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work â€œblind,â€ meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.
m
Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks
data.mendeley.com
Updated Apr 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyunggu Jung (2024). Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks [Dataset]. http://doi.org/10.17632/rnyrpzyw3h.2
Explore at:
Unique identifier
https://doi.org/10.17632/rnyrpzyw3h.2
Dataset updated
Apr 2, 2024
Authors
Hyunggu Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of reviews collected from restaurants on a Korean delivery app platform running a review event. A total of 128,668 reviews were collected from 136 restaurants by crawling reviews using the Selenium library in Python. The dataset named as Korean Reviews.csv provides review data not translated to English, and the dataset named as English Reviews.csv provides review data translated to English. The 136 chosen restaurants run review events which demand customers to write reviews with 5 stars and photos. So the annotation of data was done by considering 1) whether the review gives five-star ratings, and 2) whether the review contains photo(s).
Performance parameters.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t007
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance parameters.
A Multidimensional Dataset for Analyzing and Detecting News Bias based on...
zenodo.org
zip
Updated Jun 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Färber; Michael Färber; Victoria Burkard; Adam Jatowt; Sora Lim; Victoria Burkard; Adam Jatowt; Sora Lim (2020). A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing [Dataset]. http://doi.org/10.5281/zenodo.3885351
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3885351
Dataset updated
Jun 8, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Färber; Michael Färber; Victoria Burkard; Adam Jatowt; Sora Lim; Victoria Burkard; Adam Jatowt; Sora Lim
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We provide a large data set consisting of 2,057 sentences from 90 news articles and annotations of crowdworkers with respect to bias itself and the following bias dimensions:

hidden assumptions

subjectivity

representation tendencies

Our data set contains 44,547 labels in total (43,197 sentence labels and 1,350 article labels).

The news articles deal with the Ukraine crisis. They were published in 33 countries in total and were selected based on the data set of Cremisini et al. (Cremisini, A., Aguilar, D., & Finlayson, M. A. A Challenging Dataset for Bias Detection: The Case of the Crisis in the Ukraine, Proc. of SBP-BRiMS'19, pp. 173-183, 2019).

Each sentence was annotated by 5 crowdworkers. In total, we spent $ 3,335 for the crowdworkers annotations.

More information can be found in our GitHub repository. A description of the used file format is given in the codebook attached to the dataset.

Please cite our data set as follows:

@unpublished{Faerber2020Bias, author = {Michael F{\"{a}}rber and Victoria Burkard and Adam Jatowt and Sora Lim}, title = {{A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing}}, year = {2020} }
f
Previous works comparative table.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Previous works comparative table. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t001
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Previous works comparative table.
d
Replication Data for: Exploring Gender Bias in Homicide Sentencing: An...
search.dataone.org
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhuchkova, Svetlana; Kazun, Anton (2023). Replication Data for: Exploring Gender Bias in Homicide Sentencing: An Empirical Study of Russian Court Decisions Using Text Mining [Dataset]. http://doi.org/10.7910/DVN/0YEPAA
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/0YEPAA
Dataset updated
Dec 16, 2023
Dataset provided by
Harvard Dataverse
Authors
Zhuchkova, Svetlana; Kazun, Anton
Description
Data and replication code for the paper "Exploring Gender Bias in Homicide Sentencing: An Empirical Study of Russian Court Decisions Using Text Mining"
f
Adjective tags.
plos.figshare.com
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Adjective tags. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t005
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Adjective tags.
(Un)Fair Process Mining Event Logs
zenodo.org
application/gzip
Updated Jun 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berti Alessandro Pohl Timo; Berti Alessandro Pohl Timo (2023). (Un)Fair Process Mining Event Logs [Dataset]. http://doi.org/10.5281/zenodo.8059489
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8059489
Dataset updated
Jun 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Berti Alessandro Pohl Timo; Berti Alessandro Pohl Timo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
License: CC-BY-4.0

Event Logs:

We introduce a set of 12 distinct event logs, three for each of the four domains: hiring, healthcare, lending, and renting. These event logs have been carefully curated and simulated, each containing 10,000 cases, thereby providing an extensive resource for researchers focusing on fairness in process mining.

In each of these domains, the three event logs represent varying degrees of discrimination, offering researchers an opportunity to explore the nuances and complexities that arise in diverse real-world scenarios. By presenting each log with a thorough description of the inherent processes and their respective attributes, we aim to provide a robust groundwork for understanding the potential sources of discrimination and addressing fairness in process mining.

We have ensured that all the event logs are provided in the eXtensible Event Stream (XES) standard format. This adherence to a recognized standard not only ensures broad compatibility but also facilitates interoperability across a variety of process mining tools. By choosing this common format, we aim to encourage and simplify the utilization of these logs for researchers across different platforms.

* Hiring

The data describes a multifaceted recruitment process with diverse application pathways ranging from minimal processing to extensive multi-step procedures. The variability of these routes, largely dependent on numerous determinants, yields a spectrum of outcomes from instant rejection to successful job offers.

The logs include attributes such as age, citizenship, German proficiency, gender, religion, and years of education. While these attributes may inform candidate profiles, their misuse could engender discrimination. Variables like age and education may signify experience and skills, citizenship and German language may address job logistics, but these should not unjustly eliminate applicants. Gender and religion, unrelated to job performance, must not sway hiring. Therefore, the use of these attributes must uphold fairness, avoiding any potential bias.

* Hospital

The data depicts a hospital treatment process that commences with registration at an Emergency Room or Family Department and advances through stages of examination, diagnosis, and treatment. Notably, unsuccessful treatments often entail repetitive diagnostic and treatment cycles, underscoring the iterative nature of healthcare provision.

The logs incorporate patient attributes such as age, underlying condition, citizenship, German language proficiency, gender, and private insurance. These attributes, influencing the treatment process, may unveil potential discrimination. Factors like age and condition might affect case complexity and treatment path, while citizenship may highlight healthcare access disparities. German proficiency can impact provider-patient communication, thus affecting care quality. Gender could spotlight potential health disparities, while insurance status might indicate socio-economic influences on care quality or timeliness. Therefore, a comprehensive examination of these attributes vis-a-vis the treatment process could shed light on potential biases or disparities, fostering fairness in healthcare delivery.

* Lending

This data illustrates the steps within a loan application process. From an initial appointment request, the process navigates various stages, including information verification and underwriting, culminating in loan approval or denial. Additional steps may be required, such as co-signer enlistment or collateral assessment. Some cases experience outright appointment denial, indicating the process's variability, reflecting applicants' differing credit situations.

The logs' attributes can aid in identifying influences on outcomes and detecting discrimination. Personal characteristics ('age', 'citizen', 'German speaking', and 'gender') and socio-economic indicators ('YearsOfEducation' and 'CreditScore') can impact the process. While 'yearsOfEducation' and 'CreditScore' can validly inform creditworthiness, 'age', 'citizen', 'language ability', and 'gender' should not bias loan decisions, ensuring these attributes are used responsibly fosters equitable loan processes.

* Renting

The data represents a rental process. It begins with a prospective tenant applying to view a property. Subsequent steps include an initial screening phase, viewing, decision-making, and a potential extensive screening. The process ends with the acceptance or rejection of the prospective tenant. In some cases, a tenant may apply for viewing but be rejected without the viewing occurring.

The logs contain attributes that can shed light on potential biases in the process. 'Age', 'citizen', 'German speaking', 'gender', 'religious affiliation', and 'yearsOfEducation' might influence the rental process, leading to potential discrimination. While some attributes may provide useful insights into a potential tenant's reliability, misuse could result in discrimination. Thus, fairness must be observed in utilizing these attributes to avoid potential biases and ensure equitable treatment.
f
Categorization of doctoral theses.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Categorization of doctoral theses. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t003
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorization of doctoral theses.
A
Artificial Intelligence (AI) Verticals Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Artificial Intelligence (AI) Verticals Report [Dataset]. https://www.archivemarketresearch.com/reports/artificial-intelligence-ai-verticals-48602
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 6, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Artificial Intelligence (AI) market is experiencing explosive growth, driven by advancements in machine learning, data mining, and automatic driving technologies. While precise market size figures for 2025 aren't provided, considering the rapid expansion of AI across various sectors, a reasonable estimate for the total market size in 2025 is $500 billion, based on reports indicating substantial growth in recent years and projections for future expansion. Assuming a conservative Compound Annual Growth Rate (CAGR) of 25% for the forecast period (2025-2033), the market is projected to reach approximately $3.7 trillion by 2033. This significant expansion is fueled by several key factors. Firstly, the increasing availability and affordability of computing power allow for more complex AI models and applications. Secondly, the burgeoning volume of data generated across various industries provides rich fuel for AI algorithms. Thirdly, businesses across sectors, including healthcare, automotive, and manufacturing, are increasingly adopting AI to improve efficiency, optimize processes, and gain a competitive edge. The segments of Automatic Driving, Machine Learning and Data Mining are expected to be the key drivers of this growth, with applications in healthcare and automotive leading the charge. However, challenges remain. The high cost of AI development and implementation can pose a barrier to entry for smaller businesses. Concerns surrounding data privacy, algorithmic bias, and job displacement due to automation also represent potential restraints on market growth. Nevertheless, the overall trajectory indicates a sustained period of expansion, shaped by continuous innovation and widening adoption across diverse industries and geographical regions. Companies such as Uber, Airbnb, Salesforce, and others are at the forefront of this technological revolution, leveraging AI to enhance their services and operations. The regional breakdown shows a significant market presence across North America, Europe, and Asia Pacific, with further expansion anticipated in emerging markets. The market's growth is expected to remain robust as AI continues to permeate various facets of our lives, transforming industries and creating new opportunities.
Key indicators.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Key indicators. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t002
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Key indicators.
r
Data from: The extent and consequences of p-hacking in science
researchdata.edu.au
datadryad.org
Updated Jun 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob Lanfear; Michael D. Jennions; Megan L. Head; Luke Holman; Andrew T. Kahn (2022). Data from: The extent and consequences of p-hacking in science [Dataset]. http://doi.org/10.5061/DRYAD.79D43
Explore at:
Unique identifier
https://doi.org/10.5061/DRYAD.79D43
Dataset updated
Jun 11, 2022
Dataset provided by
Macquarie University
Authors
Rob Lanfear; Michael D. Jennions; Megan L. Head; Luke Holman; Andrew T. Kahn
Description
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.

Usage Notes

Data from: The extent and consequences of p-hacking in scienceThis zip file consists of three parts. 1. Data obtained from text-mining and associated analysis files. 2. Data obtained from previously published meta-analyses and associated analysis files. 3. Analysis files used to conduct meta-analyses of the data. Read me files are contained within this zip file.FILES_FOR_DRYAD.zip
Data Analytics Market Analysis, Size, and Forecast 2025-2029: North America...
technavio.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Data Analytics Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, and UK), Middle East and Africa (UAE), APAC (China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/data-analytics-market-industry-analysis
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global
Description
Snapshot img

Data Analytics Market Size 2025-2029

The data analytics market size is forecast to increase by USD 288.7 billion, at a CAGR of 14.7% between 2024 and 2029.

The market is driven by the extensive use of modern technology in company operations, enabling businesses to extract valuable insights from their data. The prevalence of the Internet and the increased use of linked and integrated technologies have facilitated the collection and analysis of vast amounts of data from various sources. This trend is expected to continue as companies seek to gain a competitive edge by making data-driven decisions. However, the integration of data from different sources poses significant challenges. Ensuring data accuracy, consistency, and security is crucial as companies deal with large volumes of data from various internal and external sources. Additionally, the complexity of data analytics tools and the need for specialized skills can hinder adoption, particularly for smaller organizations with limited resources. Companies must address these challenges by investing in robust data management systems, implementing rigorous data validation processes, and providing training and development opportunities for their employees. By doing so, they can effectively harness the power of data analytics to drive growth and improve operational efficiency.

What will be the Size of the Data Analytics Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free SampleIn the dynamic and ever-evolving the market, entities such as explainable AI, time series analysis, data integration, data lakes, algorithm selection, feature engineering, marketing analytics, computer vision, data visualization, financial modeling, real-time analytics, data mining tools, and KPI dashboards continue to unfold and intertwine, shaping the industry's landscape. The application of these technologies spans various sectors, from risk management and fraud detection to conversion rate optimization and social media analytics. ETL processes, data warehousing, statistical software, data wrangling, and data storytelling are integral components of the data analytics ecosystem, enabling organizations to extract insights from their data. Cloud computing, deep learning, and data visualization tools further enhance the capabilities of data analytics platforms, allowing for advanced data-driven decision making and real-time analysis. Marketing analytics, clustering algorithms, and customer segmentation are essential for businesses seeking to optimize their marketing strategies and gain a competitive edge. Regression analysis, data visualization tools, and machine learning algorithms are instrumental in uncovering hidden patterns and trends, while predictive modeling and causal inference help organizations anticipate future outcomes and make informed decisions. Data governance, data quality, and bias detection are crucial aspects of the data analytics process, ensuring the accuracy, security, and ethical use of data. Supply chain analytics, healthcare analytics, and financial modeling are just a few examples of the diverse applications of data analytics, demonstrating the industry's far-reaching impact. Data pipelines, data mining, and model monitoring are essential for maintaining the continuous flow of data and ensuring the accuracy and reliability of analytics models. The integration of various data analytics tools and techniques continues to evolve, as the industry adapts to the ever-changing needs of businesses and consumers alike.

How is this Data Analytics Industry segmented?

The data analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. ComponentServicesSoftwareHardwareDeploymentCloudOn-premisesTypePrescriptive AnalyticsPredictive AnalyticsCustomer AnalyticsDescriptive AnalyticsOthersApplicationSupply Chain ManagementEnterprise Resource PlanningDatabase ManagementHuman Resource ManagementOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKMiddle East and AfricaUAEAPACChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

By Component Insights

The services segment is estimated to witness significant growth during the forecast period.The market is experiencing significant growth as businesses increasingly rely on advanced technologies to gain insights from their data. Natural language processing is a key component of this trend, enabling more sophisticated analysis of unstructured data. Fraud detection and data security solutions are also in high demand, as companies seek to protect against threats and maintain customer trust. Data analytics platforms, including cloud-based offeri
f
Noun tags.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Noun tags. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t004
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Noun tags.
a
Stanford CS229 - Machine Learning - Andrew Ng
academictorrents.com
bittorrent
Updated Apr 24, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Ng (2015). Stanford CS229 - Machine Learning - Andrew Ng [Dataset]. https://academictorrents.com/details/da90dedfb78190e5c62af1ad40a2413cb918457f
Explore at:
bittorrent(4211379788)Available download formats
Dataset updated
Apr 24, 2015
Dataset authored and provided by
Andrew Ng
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Course Description This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs; VC theory; large margins); reinforcement learning and adaptive control. The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing. # Prerequisites Students are expected to have the following background: Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program. Familiarity with the basic probability theory. (CS109 or Stat116 is sufficient but not necessary.) Familiarity with the basic l
f
Mapping vectors to words.
figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Mapping vectors to words. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t006
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mapping vectors to words.
f
Age of the author, and impact on non-inclusiveness.
plos.figshare.com
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Age of the author, and impact on non-inclusiveness. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t010
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Age of the author, and impact on non-inclusiveness.
f
Performance of the algorithm.
plos.figshare.com
xls
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance of the algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t008
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of the algorithm.
f
Primary reporting of studies.
plos.figshare.com
xlsx
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wolfgang Emanuel Zurrer; Amelia Elaine Cannon; Ewoud Ewing; David Brüschweiler; Julia Bugajska; Bernard Friedrich Hild; Marianna Rosso; Daniel Salo Reich; Benjamin Victor Ineichen (2024). Primary reporting of studies. [Dataset]. http://doi.org/10.1371/journal.pone.0311358.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311358.s002
Dataset updated
Nov 26, 2024
Dataset provided by
PLOS ONE
Authors
Wolfgang Emanuel Zurrer; Amelia Elaine Cannon; Ewoud Ewing; David Brüschweiler; Julia Bugajska; Bernard Friedrich Hild; Marianna Rosso; Daniel Salo Reich; Benjamin Victor Ineichen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background and methodsSystematic reviews, i.e., research summaries that address focused questions in a structured and reproducible manner, are a cornerstone of evidence-based medicine and research. However, certain steps in systematic reviews, such as data extraction, are labour-intensive, which hampers their feasibility, especially with the rapidly expanding body of biomedical literature. To bridge this gap, we aimed to develop a data mining tool in the R programming environment to automate data extraction from neuroscience in vivo publications. The function was trained on a literature corpus (n = 45 publications) of animal motor neuron disease studies and tested in two validation corpora (motor neuron diseases, n = 31 publications; multiple sclerosis, n = 244 publications).ResultsOur data mining tool, STEED (STructured Extraction of Experimental Data), successfully extracted key experimental parameters such as animal models and species, as well as risk of bias items like randomization or blinding, from in vivo studies. Sensitivity and specificity were over 85% and 80%, respectively, for most items in both validation corpora. Accuracy and F1-score were above 90% and 0.9 for most items in the validation corpora, respectively. Time savings were above 99%.ConclusionsOur text mining tool, STEED, can extract key experimental parameters and risk of bias items from the neuroscience in vivo literature. This enables the tool’s deployment for probing a field in a research improvement context or replacing one human reader during data extraction, resulting in substantial time savings and contributing towards the automation of systematic reviews.
f
Data from: Improving short-term grade block models: alternative for...
scielo.figshare.com
jpeg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristina da Paixão Araújo; João Felipe Coimbra Leite Costa; Vanessa Cerqueira Koppe (2023). Improving short-term grade block models: alternative for correcting soft data [Dataset]. http://doi.org/10.6084/m9.figshare.5772303.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5772303.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELO journals
Authors
Cristina da Paixão Araújo; João Felipe Coimbra Leite Costa; Vanessa Cerqueira Koppe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Short-term mining planning typically relies on samples obtained from channels or less-accurate sampling methods. The results may include larger sampling errors than those derived from diamond drill hole core samples. The aim of this paper is to evaluate the impact of the sampling error on grade estimation and propose a method of correcting the imprecision and bias in the soft data. In addition, this paper evaluates the benefits of using soft data in mining planning. These concepts are illustrated via a gold mine case study, where two different data types are presented. The study used Au grades collected via diamond drilling (hard data) and channels (soft data). Four methodologies were considered for estimation of the Au grades of each block to be mined: ordinary kriging with hard and soft data pooled without considering differences in data quality; ordinary kriging with only hard data; standardized ordinary kriging with pooled hard and soft data; and standardized, ordinary cokriging. The results show that even biased samples collected using poor sampling protocols improve the estimates more than a limited number of precise and unbiased samples. A welldesigned estimation method corrects the biases embedded in the samples, mitigating their propagation to the block model.

Facebook

Twitter

Click to copy link

Link copied

Cite

Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions (2025). Prolific observer bias in the life sciences: why we need blind data recording [Dataset]. http://doi.org/10.5061/dryad.hn40n

Prolific observer bias in the life sciences: why we need blind data recording

Explore at:

Unique identifier

https://doi.org/10.5061/dryad.hn40n

Dataset updated

Apr 12, 2025

Dataset provided by

Dryad Digital Repository

Authors

Luke Holman; Megan L. Head; Robert Lanfear; Michael D. Jennions

Time period covered

Jan 1, 2015

Description

Observer bias and other â€œexperimenter effectsâ€ occur when researchersâ€™ expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work â€œblind,â€ meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.

Clear search

Close search

Google apps

Main menu

Prolific observer bias in the life sciences: why we need blind data...

Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks

Performance parameters.

A Multidimensional Dataset for Analyzing and Detecting News Bias based on...

Previous works comparative table.

Replication Data for: Exploring Gender Bias in Homicide Sentencing: An...

Adjective tags.

(Un)Fair Process Mining Event Logs

Categorization of doctoral theses.

Artificial Intelligence (AI) Verticals Report

Key indicators.

Data from: The extent and consequences of p-hacking in science

Usage Notes

Data Analytics Market Analysis, Size, and Forecast 2025-2029: North America...

Snapshot img

Noun tags.

Stanford CS229 - Machine Learning - Andrew Ng

Mapping vectors to words.

Age of the author, and impact on non-inclusiveness.

Performance of the algorithm.

Primary reporting of studies.

Data from: Improving short-term grade block models: alternative for...

Prolific observer bias in the life sciences: why we need blind data recordingSee More Versions

Prolific observer bias in the life sciences: why we need blind data recording