83 datasets found

o
Question Answering Dataset
opendatabay.com
.csv
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Question Answering Dataset [Dataset]. https://www.opendatabay.com/data/dataset/f629f4eb-7708-4285-b55b-6766d9a1f15a
Explore at:
.csvAvailable download formats
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset is curated to support research and development in natural language processing (NLP), particularly in the area of question answering systems. Focused on the domain of Data Science and Analytics, it contains a diverse collection of question-answer pairs designed to reflect real-world inquiries about key concepts, tools, techniques, and trends within the field.

Each entry includes:

A natural language question related to data science topics such as machine learning, data wrangling, statistical analysis, data visualization, big data technologies, and analytics methods.

A corresponding answer, verified for accuracy and clarity, suitable for use in both retrieval-based and generative QA models.

Optional metadata such as topic category, difficulty level, and source context, where applicable.

Use Cases:

Training and evaluating QA models and chatbots focused on technical domains.

Developing educational tools and intelligent tutoring systems for data science learners.

Benchmarking NLP systems for domain-specific understanding and reasoning.

Target Audience:

AI/ML researchers

Data science educators and students

NLP developers working on domain-specific applications

This dataset aims to bridge the gap between technical knowledge and natural language understanding by providing high-quality QA pairs tailored to one of today’s most in-demand fields.

Original Data Source: Question Answering Dataset
R
Question Answers Label Dataset
universe.roboflow.com
zip
Updated Nov 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Question Answer Labelling (2022). Question Answers Label Dataset [Dataset]. https://universe.roboflow.com/question-answer-labelling/question-answers-label
Explore at:
zipAvailable download formats
Dataset updated
Nov 30, 2022
Dataset authored and provided by
Question Answer Labelling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects Bounding Boxes
Description
Here are a few use cases for this project:

Digital Document Management: This model can be used to effectively organize and manage digital documents. By identifying areas such as headers, addresses, and vendors, it could streamline workflows in companies dealing with large amounts of papers, forms or invoices.

Automated Data Extraction: The model could be used in extracting pertinent information from documents automatically. For example, pulling out questions and answers from educational materials, extracting vendor or address information from invoices, or grabbing column headers from statistical reports.

Augmented Reality (AR) Applications: "Question Answers Label" can be utilized in AR glasses to give real-time information about objects a user sees, especially in the realm of paper documents.

Virtual Assistance: This model may be used to build a virtual assistant capable of reading and understanding physical documents. For instance, reading out a user's mail, helping learning from textbooks, or assisting in reviewing legal documents.

Accessibility Tools for Visually Impaired: The tool could be utilized to interpret written documents for visually impaired people by identifying and vocalizing text based on their classes (answers, questions, headers, etc).
J
ROUNDING, FOCAL POINT ANSWERS AND NONRESPONSE TO SUBJECTIVE PROBABILITY...
journaldata.zbw.eu
jda-test.zbw.eu
pdf, txt
Updated Dec 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristin J. Kleinjans; Arthur van Soest; Kristin J. Kleinjans; Arthur van Soest (2022). ROUNDING, FOCAL POINT ANSWERS AND NONRESPONSE TO SUBJECTIVE PROBABILITY QUESTIONS (replication data) [Dataset]. http://doi.org/10.15456/jae.2022321.0714426723
Explore at:
txt(2913), pdf(31909)Available download formats
Unique identifier
https://doi.org/10.15456/jae.2022321.0714426723
Dataset updated
Dec 7, 2022
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Kristin J. Kleinjans; Arthur van Soest; Kristin J. Kleinjans; Arthur van Soest
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We develop a panel data model explaining answers to subjective probabilities about binary events and estimate it using data from the Health and Retirement Study on six such probabilities. The model explicitly accounts for several forms of reporting behavior: rounding, focal point 50% answers and item nonresponse. We find observed and unobserved heterogeneity in the tendencies to report rounded values or a focal answer, explaining persistency in 50% answers over time. Focal 50% answers matter for some of the probabilities. Incorporating reporting behavior does not have a large effect on the estimated distribution of the genuine subjective probabilities.
P
ScanQA Dataset
paperswithcode.com
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daichi Azuma; Taiki Miyanishi; Shuhei Kurita; Motoaki Kawanabe (2024). ScanQA Dataset [Dataset]. https://paperswithcode.com/dataset/scanqa
Explore at:
Dataset updated
Jun 12, 2024
Authors
Daichi Azuma; Taiki Miyanishi; Shuhei Kurita; Motoaki Kawanabe
Description
We collected 41,363 questions and 58,191 answers, in- cluding 32,337 unique questions and 16,999 unique an- swers. Table 2 presents the statistics of the ScanQA dataset. This dataset is an order of magnitude larger than existing embodied question-answering datasets in terms of both question size and variation. For example, the EQA dataset contains 4,246 questions, consisting of 147 unique questions in its training set. The EQA-MP3D dataset contains 767 questions consisting of 174 unique questions in its training set. Considering that our dataset contains not only question–answer pairs but also 3D object localization annotations, we assume that this is the largest dataset to specify the nature of objects in 3D scenes with the question answering form. The distribution of the questions based on their first word. We collected various types of questions through question auto-generation and editing by humans.
e
Numerical solution of Algebraic equation
paper.erudition.co.in
html
Updated Feb 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2022). Numerical solution of Algebraic equation [Dataset]. https://paper.erudition.co.in/makaut/master-of-computer-applications-2-years/2/numerical-and-statistical-analysis
Explore at:
htmlAvailable download formats
Dataset updated
Feb 6, 2022
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Numerical solution of Algebraic equation of Numerical and Statistical Analysis, 2nd Semester , Master of Computer Applications (2 Years)
Data Science Interview Questions
kaggle.com
Updated Aug 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iron486 (2022). Data Science Interview Questions [Dataset]. https://www.kaggle.com/datasets/die9origephit/data-science-interview-questions/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Iron486
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6372737%2F4eb9aea3a5d077e75fae1b3d0d292dd9%2FMetroMap_Data_Analyst.png?generation=1660517959830249&alt=media">

Content

This is a collection of questions useful for people who want to test their data science knowledge for interviews or for refreshing some specific topics.
Most of the questions are related to data science, data analysis, machine learning, deep learning, probability, statistics and programming. Majority of them, include answers too.

Acknowledgements

The questions were fetched from various sources. After being collected, some typos were corrected, and the style and the format of the questions were modified, making the pdfs more readable. Here are the sources: https://www.nicksingh.com/posts/40-probability-statistics-data-science-interview-questions-asked-by-fang-wall-street https://github.com/kojino/120-Data-Science-Interview-Questions https://intellipaat.com/blog/interview-question/data-science-interview-questions/ https://www.projectpro.io/article/100-deep-learning-interview-questions-and-answers-for-2021/419
e
Roots of Equations
paper.erudition.co.in
html
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Roots of Equations [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2020-2021/5/numerical-and-statistical-methods
Explore at:
htmlAvailable download formats
Dataset updated
Jun 7, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Roots of Equations of Numerical and statistical Methods, 5th Semester , Bachelor of Computer Application 2020-2021
e
Introduction to Statistics & Probability
paper.erudition.co.in
html
Updated Feb 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2022). Introduction to Statistics & Probability [Dataset]. https://paper.erudition.co.in/makaut/master-of-computer-applications-2-years/2/numerical-and-statistical-analysis
Explore at:
htmlAvailable download formats
Dataset updated
Feb 6, 2022
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Introduction to Statistics & Probability of Numerical and Statistical Analysis, 2nd Semester , Master of Computer Applications (2 Years)
CourseKata Dataset Items (QuestionTypes)
kaggle.com
Updated Apr 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gagan Karnati (2024). CourseKata Dataset Items (QuestionTypes) [Dataset]. https://www.kaggle.com/datasets/gagankarnati/coursekata-dataset-items-questiontypes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gagan Karnati
Description
CourseKata is a platform that creates and publishes a series of e-books for introductory statistics and data science classes that utilize demonstrated learning strategies to help students learn statistics and data science. The developers of CourseKata, Jim Stigler (UCLA) and Ji Son (Cal State Los Angeles) and their team, are cognitive psychologists interested in improving statistics learning by examining students' interactions with online interactive textbooks. Traditionally, much of the research in how students learn is done in a 1-hour lab or through small-scale interviews with students. CourseKata offers the opportunity to peek into the actions, responses, and choices of thousands of students as they are engaged in learning the interrelated concepts and skills of statistics and coding in R over many weeks or months in real classes.

items.csv (1335 X 19) Each row contains information about a particular question (although it does not provide the prompt). The item to which a question belongs is included. All items/questions are represented. Use this file to go deeper into particular questions that students encounter in the course.

Questions are grouped into items (item_id). An item can be one of three item_type 's: code, learnosity or learnosity-activity (the distinction between learnosity and learnosity-activity is not important). Code items are a single question and ask for R code as a response. (Responses can be seen in responses.csv.) Learnosity-activities and learnosity items are collections of one or more questions that can be of a variety of lrn_type's: ● association ● choicematrix ● clozeassociation ● formulaV2 ● imageclozeassociation ● mcq ● plaintext ● shorttext ● sortlist

Examples of these question types are provided at the end of this document.

The level of detail made available to you in the responses file depends on the lrn_type. For example, for multiple choice questions (mcq), you can find the options in the responses file in the columns labeled lrn_option_0 through lrn_option_11, and you can see the chosen option in the results variable.

Assessment Types In general, assessments, such as the items and questions included in CourseKata, can be used for two purposes. Formative assessments are meant to provide feedback to the student (and instructor), or to serve as a learning aid to help prompt students improve memory and deepen their understanding. Summative assessments are meant to provide a summary of a student's understanding, often for use in assigning a grade. For example, most midterms and final exams that you've taken are summative assessments.

The vast majority of items in CourseKata should be treated as formative assessments. The exceptions are the end-of-chapter Review questions, which can be thought of as summative. The mean number of correct answers for end-of-chapter review questions is provided within the checkpoints file. You might see that some pages have the word "Quiz" or "Exam" or "Midterm" in them. Results from these items and responses to them are not provided to us in this data set.
d
Data from: Reference Mysteries: The Quest for Answers
search.dataone.org
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Hamilton (2023). Reference Mysteries: The Quest for Answers [Dataset]. http://doi.org/10.5683/SP3/LH36YJ
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/LH36YJ
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Elizabeth Hamilton
Description
The solutions of mysteries can lead to salvation for those on the reference desk dealing with business students or difficult questions.
Analysis of the experience, interests, and expectations of first-year...
figshare.com
portalcientificovalencia.univeuropea.com
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Yeste (2024). Analysis of the experience, interests, and expectations of first-year students of the UEV STEAM degrees [Dataset]. http://doi.org/10.6084/m9.figshare.25161746.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.25161746.v1
Dataset updated
Feb 7, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Víctor Yeste
Description
Author: Víctor Yeste. Universitat Politècnica de València. Universidad Europea de Valencia.The main objective is to analyze, using descriptive statistics, the experience, interests, and expectations of the programming languages by first-year students of the STEAM degrees of the European University of Valencia.Google Forms was chosen to evaluate students' views on programming languages and computational thinking through a question. It is a free tool that has been used in many studies, such as Haddad and Kalaani (2014), to capture the opinion of students beyond course assessment surveys, as it is straightforward, systematic, and easy to implement. It can be used through a web-based application to create online questionnaires with a friendly interface. All answers are collected using a Google Spreadsheet document stored on Google Drive. In addition, it enables the results of the questionnaire to be visualized through a statistical summary of each question and its answers.The questionnaire consisted of 19 questions, although some were subject to a specific answer to a previous question. To carry out the form, the first day of class of the subject of Fundamentals of Programming or Scientific Computing I has been chosen (depending on the degree, has a different name, even the same), specifically in the classes of 19 and 20 September 2023. It is a subject that is given in the first semester of the first year of all STEAM degrees of the European University of Valencia, which include Data Science, Physics, Engineering in Industrial Organization, and a Double Engineering Degree in Engineering in Industrial Organization and Business Administration and Management. In this subject, computational thinking is developed thanks to the study of theory and a significant practical component of programming in C++, one of today's most influential and essential programming languages (Cyganek, 2022).The questionnaire was proposed to first-year 2023-2024 students, encouraging them to participate in the first class they had on the subject and through a direct link to the questionnaire on the virtual campus, based on Canvas.This dataset has contributed to the elaboration of the book chapter:Yeste, Víctor (2024). ¿Los alumnos de STEAM saben programar al comenzar la universidad? Análisis de su experiencia, intereses y expectativas. In Perspectivas Contemporáneas en Educación: Innovación, Investigación y Transformación, Dykinson S.L.
f
Ten quick tips for getting the most scientific value out of numerical data
plos.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lars Ole Schwen; Sabrina Rueschenbaum (2023). Ten quick tips for getting the most scientific value out of numerical data [Dataset]. http://doi.org/10.1371/journal.pcbi.1006141
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006141
Dataset updated
May 30, 2023
Dataset provided by
PLOS Computational Biology
Authors
Lars Ole Schwen; Sabrina Rueschenbaum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation.Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results.These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.
Assessment of EBM
zenodo.org
bin
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roshan Shinde; Roshan Shinde (2025). Assessment of EBM [Dataset]. http://doi.org/10.5281/zenodo.15402703
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15402703
Dataset updated
May 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Roshan Shinde; Roshan Shinde
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Evidence-based medicine: assessment of

knowledge of basic epidemiological and research methods among medical doctors

Submitted to Venera ma'am by Roshan Shinde Group 32

EVIDENCE BASED MEDICINE is the main source of new knowledge for doctors in this era. The main objectives of EBM are as follows,

To evaluate the knowledge of basic research methods and data analysis among medical doctors. To assess factors such as the country of the medical school graduation profession.

Importance of Research Competence:

1. The study emphasizes that a solid understanding of epidemiology and biostatistics is essential for doctors to critically appraise medical literature and make informed clinical decisions.

2. Previous Findings: Prior studies indicated that many doctors lack proficiency in research methods, with significant gaps in understanding key concepts of evidence-based medicine (EBM).

Materials and Methods

Data collection and study population

The study involved 40 departments and employed around 500 doctors.

A random selection of 15 departments was made for participant recruitment.

Data collection

A supervised, self-administered questionnaire was distributed during morning staff meetings.

The questionnaire consisted of 10 multiple-choice questions focused on basic epidemiology and statistics, along with demographic data.

Participants were divided into two groups based on their country of medical school graduation: those from the former Soviet Union (Eastern education) and those from other countries (Western education).

The questionnaire was completed anonymously, and all participants were efficient in Hebrew.

Questionnaire

1. Sections of the Questionnaire:

Personal Details: This section collected demographic information about the doctors, including:

• Country of graduation

• Year of graduation from medical school

• Professional status (whether they are specialists or residents)

• Reading and writing habits related to medical literature.

Knowledge Assessment: This section consisted of 10 multiple-choice questions focused on basic research methods and statistics, divided as follows:

Statistics: 5 questions

Epidemiology: 5 questions

2. Basis for Statistical Questions:

The questions on statistics were derived from a list of commonly used statistical methods identified by Emerson and Colditz in 1983. This list was previously utilized for quality evaluations of articles published in the New England Journal of Medicine and referenced in a similar study by Horton and Switzee. This approach ensures that the questions are relevant and grounded in established research practices.

3. Scoring Methodology:

• Any missing answers to questions on epidemiological and statistical methods were considered incorrect. This scoring method emphasizes the importance of attempting to answer all questions and reflects a strict approach to assessing knowledge.

• The decision to mark unanswered questions as incorrect may encourage participants to engage more thoughtfully with the questionnaire, although it could also discourage some from attempting to answer if they are unsure

To ensure validity of the questionnaire, the 10 questions assessing knowledge were given to 15 members of the Epidemiology Department, Ben‐Gurion University. All of them correctly answered all the questions.

Results:

Response Rate: Out of 260 eligible doctors, 219 completed the questionnaire (84.2% response rate).

Statistical methods

1. Comparison of Categorical Variables:

Chi-Squared Test (x²): This test was used to examine differences between categorical variables. It assesses whether the observed frequencies in each category differ from what would be expected under the null hypothesis.

Fisher's Exact Test: This test was employed when sample sizes were small or when the assumptions of the chi-squared test were not met. It is particularly useful for 2×2 contingency tables.

2. Comparison of Ordinal Variables:

Mann-Whitney U Test: This non-parametric test was used to compare ordinal variables with multiple values, such as the scores obtained from the questionnaire. It assesses whether the distributions of two independent samples differ.

3. Paired Comparisons:

Wilcoxon's Signed Rank Test: This non-parametric test was used for paired comparisons of scores. It evaluates whether the median of the differences between paired observations is significantly different from zero.

4. Correlation Analysis:

Spearman's Rank Correlation Coefficient: This test was used to estimate the correlation between continuous variables. It assesses how well the relationship between two variables can be described using a monotonic function.

5. Multivariable Analysis:

Linear Regression: This method was used to explain the final score based on multiple variables. The analysis adjusted for all variables that were found to be related in the univariable analysis with a p-value of less than 0.1. This approach helps to identify the independent effects of each variable on the outcome.

6. Significance Level:

A p-value of 0.05 was considered statistically significant, indicating that there is less than a 5% probability that the observed results occurred by chance.

7. Data Presentation:

Normally distributed variables were expressed as mean (standard deviation, SD), while non-normally distributed variables were presented as median and interquartile range (IQR). This distinction is important for accurately representing the data's distribution.

Table 2 depicts doctors' professional characteristics according to the country of medical school graduation. Of 219 participants, 84 (38.4%) graduated from the former Soviet republics. The remaining 135 doctors were distributed by the country of graduation as follows: Israel, 100 (45.7%); West and Central Europe, 22 (10.0%); Italy, 8; Germany, 3; Czech Republic, 3; Hungary, 3; Netherlands, 1; Romania, 4; South America, 10 (4.6%); Argentina, 5; Cuba, 3; Uruguay, 1; Brazil, 1; and North America, 3 (1.4%).

Time Elapsed Since Graduation:

• Doctors from Israel and other countries had a shorter time since graduation compared to those from the former Soviet Union:

• Foreign Graduates: 8 years

(Interquartile Range (IQR) 4-19)

Former Soviet Union Graduates: 10 years (IQR 6-19)

• The difference was statistically significant (p = 0.02), indicating that foreign graduates tended to have graduated more recently.

Professional Status:

There were fewer specialists among foreign graduates compared to those who graduated from Israel

Foreign Graduates: 32.8% were specialists

Israeli Graduates: 48.0% were specialists

This difference was also statistically significant (p = 0.02).

Choice of Residency:

There were notable differences in the choice of residency between the two groups:

Domestic Graduates: 29.3% chose pediatrics or obstetrics and gynecology

Conclusion

The analysis of doctors' professional characteristics based on their country of medical school graduation reveals important insights into the diversity of medical training backgrounds and their implications for specialization and residency choices. These findings underscore the need for ongoing evaluation of medical education and training systems to ensure that all graduates, regardless of their background, are adequately prepared to meet the healthcare needs of the population

Table 3 describes the reading and publishing habits of the participants. A total of 96% of the participants reported reading at least one article per week, whereas 35.2% usually read at least three articles. Specialists read significantly more articles per week—52.3% of them read at least three articles, compared with only 23.8% of the residents; p<0.001. Most of the doctors, 63.6%, participated in the writing of ⩽5 articles. Similar to the reading pattern, only 21.1% of the residents wrote ⩾6 articles compared with 44.0% of the specialists; p<0.001. The Spearman correlation value between reading and writing variables was 0.35; p<0.001

Conclusion

The analysis of reading and publishing habits among the study participants reveals important insights into the professional engagement of doctors with medical literature. The differences between specialists and residents, along with the positive correlation between reading and writing, underscore the need for targeted educational initiatives to enhance research literacy and foster a culture of inquiry within the medical community. Encouraging both reading and writing can contribute to the overall quality of medical practice and the advancement of evidence-based medicine.

Figure 1

The figure describes the average of correct answers to 10 questions in understanding different aspects of basic research methods. Two populations of doctors are compared: those who graduated in the former Soviet Union (Eastern type of education) and those who graduated in Israel, USA, Western and Central Europe,
h
NATCOOP dataset
heidata.uni-heidelberg.de
csv, docx, pdf, tsv +1
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess (2022). NATCOOP dataset [Dataset]. http://doi.org/10.11588/DATA/GV8NBL
Explore at:
docx(90179), pdf(432619), csv(3441765), docx(499022), tsv(86553), pdf(473493), pdf(856157), pdf(467245), docx(101203), pdf(351653), pdf(576588), pdf(200225), pdf(124038), type/x-r-syntax(14339), pdf(345323), pdf(69467), docx(43108), pdf(268168), docx(493800), docx(25110), docx(43036), pdf(270379), pdf(77960), pdf(464499), pdf(392748), docx(42158), pdf(374488), docx(498354), pdf(282466), pdf(482954), pdf(302513), pdf(513748), pdf(126342), docx(33772), tsv(2313475), pdf(441389), pdf(92836), pdf(392718)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/GV8NBL
Dataset updated
Jan 27, 2022
Dataset provided by
heiDATA
Authors
Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess
License
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBLhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBL
Time period covered
Jan 1, 2017 - Jan 1, 2021
Dataset funded by
European Commission
Description
The NATCOOP project set out to study how nature shapes the preferences and incentives of economic agents and how this in turn affects common-pool resource management. Imagine a group of fishermen targeting a species that requires a lot of teamwork to harvest. Do these fishers become more social over time compared to fishers that work in a more solitary manner? If so, does this have implications for how the fishery should be managed? To study this, the NATCOOP team travelled to Chile and Tanzania and collected data using surveys and economic experiments. These two very different countries have a large population of small-scale fishermen, and both host several distinct types of fisheries. Over the course of five field trips, the project team surveyed more than 2500 fishermen with each field trip contributing to the main research question by measuring fishermen’s preferences for cooperation and risk. Additionally, each fieldtrip aimed to answer another smaller research question that was either focused on risk taking or cooperation behavior in the fisheries. The data from both surveys and experiments are now publicly available and can be freely studied by other researchers, resource managers, or interested citizens. Overall, the NATCOOP dataset contains participants’ responses to a plethora of survey questions and their actions during incentivized economic experiments. It is available in both the .dta and .csv format, and its use is recommended with statistical software such as R or Stata. For those unaccustomed with statistical analysis, we included a video tutorial on how to use the data set in the open-source program R.
Usage of generative AI in the U.S. 2023
statista.com
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Usage of generative AI in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/1413836/use-of-generative-ai-us/
Explore at:
Dataset updated
Sep 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
The main use of generative AI was in seeking answers to questions the user did not know or generally brainstorming. Over half the respondents used generative AI in such cases in 2023. Coding and writing lyrics were the least influential use cases, with barely 18 percent of users using generative AI in such tasks.
e
Sample Space
paper.erudition.co.in
html
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Sample Space [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2020-2021/5/numerical-and-statistical-methods
Explore at:
htmlAvailable download formats
Dataset updated
Jun 22, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Sample Space of Numerical and statistical Methods, 5th Semester , Bachelor of Computer Application 2020-2021
d
Data from: Reference Mysteries
search.dataone.org
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Hamilton (2023). Reference Mysteries [Dataset]. http://doi.org/10.5683/SP3/2VLBGJ
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/2VLBGJ
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Elizabeth Hamilton
Description
The requests we receive at the Reference Desk keep surprising us. We'll take a look at some of the best examples from the year on data questions and data solutions.
d
Data from - Statistical power and the detection of global change responses:...
dataone.org
smithsonian.dataone.org
+1more
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. Joseph Wright (2024). Data from - Statistical power and the detection of global change responses: The case of leaf production in old-growth forests [Dataset]. https://dataone.org/datasets/urn%3Auuid%3A8bbcd334-059b-45b1-9b83-94b52abbd6f8
Explore at:
Dataset updated
Dec 2, 2024
Dataset provided by
Smithsonian Research Data Repository
Authors
S. Joseph Wright
Description
Forests sequester a substantial portion of anthropogenic carbon emissions. Many open questions concern how. We address two of these questions (Wright and Calderón 2025). Has leaf and fine litter production changed? And what is the contribution of old-growth forests? We address these questions with long-term records (≥10 years) of total, reproductive, and especially foliar fine litter production from 32 old-growth forests. We expect increases in forest productivity associated with rising atmospheric carbon dioxide concentrations and, in cold climates, with rising temperatures. We evaluate the statistical power of our analysis using simulations of known temporal trends parameterized with sample sizes (number of years) and levels of interannual variation observed for each record. Statistical power is inadequate to detect biologically plausible trends for records lasting less than 20 years. Just four old-growth forests have records of fine litter production lasting longer than 20 years, and these four provide no evidence for increases. Three of the four forests are in central Panama, also have long-term records of wood production, and both components of aboveground production are unchanged over 21 to 38 years. The possibility that recent increases in forest productivity are limited for old-growth forests deserves more attention. Modest interannual variation characterizes fine litter production, and more variable phenomena will require even longer records to evaluate global change responses with sufficient statistical power. The data files and R scripts in this data package recreate the analyses of Wright and Calderón (2025). References Wright, S. J. and O. Calderón. 2025. Statistical power and the detection of global change responses: The case of leaf production in old-growth forests. Ecology (accepted 28 October 2024; manuscript ECY23-1254.R1)
Data from: Japanese FAQ dataset for e-learning system
zenodo.org
csv, html, tsv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai (2020). Japanese FAQ dataset for e-learning system [Dataset]. http://doi.org/10.5281/zenodo.2783642
Explore at:
csv, tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2783642
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai
Description
This dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "An FAQ Dataset for E-learning System Used on a Japanese University", Data in Brief, Elsevier, in press.

This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.

We attach an English version dataset translated from the Japanese dataset to ease understanding what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.

File contents:

FAQ data (*.csv)

Answer2Category.csv: Categories of answers.

Answer2Tag.csv: Titles of answers.

Answers.csv: IDs for answers and texts of answers.

Categories.csv: Names of categories for answers.

Questions.csv: Texts of questions and their corresponding answer IDs.

Answers_english.csv: IDs for answers and texts of answers written in English.

Categories_english.csv: Names of categories for answers and their corresponding English names.

Questions_english.csv: Texts of questions and their corresponding answer IDs written in English.

Statistics (*.tsv)
Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.

Grants: JSPS KAKENHI Grant Number 18H01057
w
Subjective wellbeing, 'Happy Yesterday', percentage of responses in range...
data.wu.ac.at
opendatacommunities.org
+1more
html, sparql
Updated Aug 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry of Housing, Communities and Local Government (2018). Subjective wellbeing, 'Happy Yesterday', percentage of responses in range 0-6 [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/OTgxM2M4Y2ItZTk3Zi00OThkLWE4YTItZTJjZTJjMDRkYjgw
Explore at:
sparql, htmlAvailable download formats
Dataset updated
Aug 20, 2018
Dataset provided by
Ministry of Housing, Communities and Local Government
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Percentage of responses in range 0-6 out of 10 (corresponding to 'low wellbeing') for 'Happy Yesterday' in the First ONS Annual Experimental Subjective Wellbeing survey.

The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys.

Overall, how satisfied are you with your life nowadays?

Overall, to what extent do you feel the things you do in your life are worthwhile?

Overall, how happy did you feel yesterday?

Overall, how anxious did you feel yesterday?

This dataset presents results from the third of these questions, "Overall, how happy did you feel yesterday?" Respondents answer these questions on an 11 point scale from 0 to 10 where 0 is ‘not at all’ and 10 is ‘completely’. The well-being questions were asked of adults aged 16 and older.

Well-being estimates for each unitary authority or county are derived using data from those respondents who live in that place. Responses are weighted to the estimated population of adults (aged 16 and older) as at end of September 2011.

The data cabinet also makes available the proportion of people in each county and unitary authority that answer with ‘low wellbeing’ values. For the ‘happy yesterday’ question answers in the range 0-6 are taken to be low wellbeing.

This dataset contains the percentage of responses in the range 0-6. It also contains the standard error, the sample size and lower and upper confidence limits at the 95% level.

The ONS survey covers the whole of the UK, but this dataset only includes results for counties and unitary authorities in England, for consistency with other statistics available at this website.

At this stage the estimates are considered ‘experimental statistics’, published at an early stage to involve users in their development and to allow feedback. Feedback can be provided to the ONS via this email address.

The APS is a continuous household survey administered by the Office for National Statistics. It covers the UK, with the chief aim of providing between-census estimates of key social and labour market variables at a local area level. Apart from employment and unemployment, the topics covered in the survey include housing, ethnicity, religion, health and education. When a household is surveyed all adults (aged 16+) are asked the four subjective well-being questions.

The 12 month Subjective Well-being APS dataset is a sub-set of the general APS as the well-being questions are only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. This reduces the size of the achieved sample to approximately 120,000 adult respondents in England.

The original data is available from the ONS website.

Detailed information on the APS and the Subjective Wellbeing dataset is available here.

As well as collecting data on well-being, the Office for National Statistics has published widely on the topic of wellbeing. Papers and further information can be found here.

Facebook

Twitter

Click to copy link

Link copied

Cite

Datasimple (2025). Question Answering Dataset [Dataset]. https://www.opendatabay.com/data/dataset/f629f4eb-7708-4285-b55b-6766d9a1f15a

Question Answering Dataset

Explore at:

.csvAvailable download formats

Dataset updated

Jun 6, 2025

Dataset authored and provided by

Datasimple

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Data Science and Analytics

Description

This dataset is curated to support research and development in natural language processing (NLP), particularly in the area of question answering systems. Focused on the domain of Data Science and Analytics, it contains a diverse collection of question-answer pairs designed to reflect real-world inquiries about key concepts, tools, techniques, and trends within the field.

Each entry includes:

A natural language question related to data science topics such as machine learning, data wrangling, statistical analysis, data visualization, big data technologies, and analytics methods.

A corresponding answer, verified for accuracy and clarity, suitable for use in both retrieval-based and generative QA models.

Optional metadata such as topic category, difficulty level, and source context, where applicable.

Use Cases:

Training and evaluating QA models and chatbots focused on technical domains.

Developing educational tools and intelligent tutoring systems for data science learners.

Benchmarking NLP systems for domain-specific understanding and reasoning.

Target Audience:

AI/ML researchers

Data science educators and students

NLP developers working on domain-specific applications

This dataset aims to bridge the gap between technical knowledge and natural language understanding by providing high-quality QA pairs tailored to one of today’s most in-demand fields.

Original Data Source: Question Answering Dataset

Clear search

Close search

Google apps

Main menu

Question Answering Dataset

Question Answers Label Dataset

ROUNDING, FOCAL POINT ANSWERS AND NONRESPONSE TO SUBJECTIVE PROBABILITY...

ScanQA Dataset

Numerical solution of Algebraic equation

Data Science Interview Questions

Content

Acknowledgements

Roots of Equations

Introduction to Statistics & Probability

CourseKata Dataset Items (QuestionTypes)

Data from: Reference Mysteries: The Quest for Answers

Analysis of the experience, interests, and expectations of first-year...

Ten quick tips for getting the most scientific value out of numerical data

Assessment of EBM

NATCOOP dataset

Usage of generative AI in the U.S. 2023

Sample Space

Data from: Reference Mysteries

Data from - Statistical power and the detection of global change responses:...

Data from: Japanese FAQ dataset for e-learning system

Subjective wellbeing, 'Happy Yesterday', percentage of responses in range...

Question Answering Dataset