53 datasets found

Data from: Japanese FAQ dataset for e-learning system
zenodo.org
csv, html, tsv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai (2020). Japanese FAQ dataset for e-learning system [Dataset]. http://doi.org/10.5281/zenodo.2783642
Explore at:
csv, tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2783642
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai
Description
This dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "An FAQ Dataset for E-learning System Used on a Japanese University", Data in Brief, Elsevier, in press.

This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.

We attach an English version dataset translated from the Japanese dataset to ease understanding what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.

File contents:

FAQ data (*.csv)

Answer2Category.csv: Categories of answers.

Answer2Tag.csv: Titles of answers.

Answers.csv: IDs for answers and texts of answers.

Categories.csv: Names of categories for answers.

Questions.csv: Texts of questions and their corresponding answer IDs.

Answers_english.csv: IDs for answers and texts of answers written in English.

Categories_english.csv: Names of categories for answers and their corresponding English names.

Questions_english.csv: Texts of questions and their corresponding answer IDs written in English.

Statistics (*.tsv)
Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.

Grants: JSPS KAKENHI Grant Number 18H01057
International Students
kaggle.com
zip
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FATEMA ISLAM MEEM (2024). International Students [Dataset]. https://www.kaggle.com/datasets/fatemaislammeem/international-students
Explore at:
zip(6732 bytes)Available download formats
Dataset updated
Mar 1, 2024
Authors
FATEMA ISLAM MEEM
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Title: Survey of International Students

Creator: Fatema Islam Meem, Imran Hussain Mahdy

Subject: International Students; Academic and Social Integration; University of Idaho

Description: This dataset contains responses from a survey conducted to understand the factors affecting the academic success and social integration of international students at the University of Idaho.

Contributor: University of Idaho

Date: February 10, 2024

Type: Dataset

Format: CSV

Source: Goole Form

Language: English

Coverage: University of Idaho, [2014-2024]

Sources: The primary data source was a Google Form survey designed to capture international students' perspectives on their integration into the academic and social fabric of the university. Questions were developed to explore academic challenges, social integration, support systems, and overall satisfaction with their university experience.

Collection Methodology: The survey was distributed with the assistance of the International Programs Office (IPO) at the University of Idaho to ensure a broad reach among the target demographic. Efforts were made to design the survey questions to be clear, concise, and sensitive to the cultural diversity of the respondents. The collection process faced challenges, particularly in achieving a high response rate. Despite these obstacles, approximately 48 responses were obtained, providing valuable insights into the experiences of international students. The survey data were anonymized to protect respondents' privacy and maintain data integrity.
Data from: University of Washington - Beyond High School (UW-BHS)
icpsr.umich.edu
search.datacite.org
ascii, delimited, r +3
Updated Feb 15, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hirschman, Charles; Almgren, Gunnar (2016). University of Washington - Beyond High School (UW-BHS) [Dataset]. http://doi.org/10.3886/ICPSR33321.v5
Explore at:
delimited, r, ascii, spss, stata, sasAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR33321.v5
Dataset updated
Feb 15, 2016
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Hirschman, Charles; Almgren, Gunnar
License
https://www.icpsr.umich.edu/web/ICPSR/studies/33321/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/33321/terms
Time period covered
2000 - 2010
Area covered
United States, Washington
Description
The University of Washington - Beyond High School (UW-BHS) project surveyed students in Washington State to examine factors impacting educational attainment and the transition to adulthood among high school seniors. The project began in 1999 in an effort to assess the impact of I-200 (the referendum that ended Affirmative Action) on minority enrollment in higher education in Washington. The research objectives of the project were: (1) to describe and explain differences in the transition from high school to college by race and ethnicity, socioeconomic origins, and other characteristics, (2) to evaluate the impact of the Washington State Achievers Program, and (3) to explore the implications of multiple race and ethnic identities. Following a successful pilot survey in the spring of 2000, the project eventually included baseline and one-year follow-up surveys (conducted in 2002, 2003, 2004, and 2005) of almost 10,000 high school seniors in five cohorts across several Washington school districts. The high school senior surveys included questions that explored students' educational aspirations and future career plans, as well as questions on family background, home life, perceptions of school and home environments, self-esteem, and participation in school related and non-school related activities. To supplement the 2000, 2002, and 2003 student surveys, parents of high school seniors were also queried to determine their expectations and aspirations for their child's education, as well as their own educational backgrounds and fields of employment. Parents were also asked to report any financial measures undertaken to prepare for their child's continued education, and whether the household received any form of financial assistance. In 2010, a ten-year follow-up with the 2000 senior cohort was conducted to assess educational, career, and familial outcomes. The ten year follow-up surveys collected information on educational attainment, early employment experiences, family and partnership, civic engagement, and health status. The baseline, parent, and follow-up surveys also collected detailed demographic information, including age, sex, ethnicity, language, religion, education level, employment, income, marital status, and parental status.
Nexdata | Chinese Test Question Texts from Elementary School to University...
datarade.ai
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Chinese Test Question Texts from Elementary School to University Parsing And Processing Data | 130 Million [Dataset]. https://datarade.ai/data-products/nexdata-chinese-test-question-texts-from-elementary-school-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Nexdata
Area covered
China
Description
130 million Chinese question text data from primary school to college, 17.63 million K12 question data (including 14.36 million parsing questions), 117 million college and vocational question data; K12 test questions include fields such as quality, type, grade_band, difficulty, grade, course, answer, solution and etc.; university and vocational test questions include fields such as titie, answer and etc. For K12 test questions, the educational stages are primary school, junior high school, and senior high school, with subjects including Chinese, mathematics, English, history, geography, politics, biology, physics, chemistry, and science; for university and vocational test questions, the fields include public security, civil service, medicine, foreign languages, academic qualifications, engineering, education, law, economics, vocational, computer science, professional qualifications, and finance.

Data content

K-12 test question data + university and vocational test question data

Data size

Total of 17.63 million K12 test questions (including 14.36 million with explanations); total of 117 million university and vocational test questions

Data fields

K12 test questions include fields such as quality, type, grade_band, difficulty, grade, course, answer, solution and etc.; university and vocational test questions include fields such as titie, answer

Major categories

For K12 test questions, the educational stages are primary school, junior high school, and senior high school, with subjects including Chinese, mathematics, English, history, geography, politics, biology, physics, chemistry, and science; for university and vocational test questions, the fields include public security, civil service, medicine, foreign languages, academic qualifications, engineering, education, law, economics, vocational, computer science, professional qualifications, and finance.

Question type categories

Multiple-choice, Single-choice, True or false, Fill-in-the-blank, etc.

Storage format

Jsonl

Language

Chinese

Data processing

All fields were analyzed, and content was also cleaned
o
University SET data, with faculty and courses characteristics
openicpsr.org
Updated Sep 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
Explore at:
Unique identifier
https://doi.org/10.3886/E149801V1
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
Q&A for Admission of Higher Education Institution
kaggle.com
zip
Updated Oct 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jocelyn Dumlao (2024). Q&A for Admission of Higher Education Institution [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/q-and-a-for-admission-of-higher-education-institution
Explore at:
zip(33330 bytes)Available download formats
Dataset updated
Oct 14, 2024
Authors
Jocelyn Dumlao
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Question Answering for Admission of Higher Education Institution

Description

The data collection process commenced with web scraping of a selected higher education institution's website, collecting any data that relates to the admission topic of higher education institutions, during the period from July to September 2023. This resulted in a raw dataset primarily cantered around admission-related content. Subsequently, meticulous data cleaning and organization procedures were implemented to refine the dataset. The primary data, in its raw form before annotation into a question-and-answer format, was predominantly in the Indonesian language. Following this, a comprehensive annotation process was conducted to enrich the dataset with specific admission-related information, transforming it into secondary data. Both primary and secondary data predominantly remained in the Indonesian language. To enhance data quality, we added filters to remove or exclude: 1) data not in the Indonesian language, 2) data unrelated to the admission topic, and 3) redundant entries. This meticulous curation has culminated in the creation of a finalized dataset, meticulously prepared and now readily available for research and analysis in the domain of higher education admission.

Categories:

Computer Science, Education, Marketing, Natural Language Processing

Acknowledgements & Source

Emny Yossy,Derwin Suhartono,Agung Trisetyarso,Widodo Budiharto

Data Source: Mendeley Data
10.44M English Test QA Dataset – All Grades & Subjects
nexdata.ai
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 10.44M English Test QA Dataset – All Grades & Subjects [Dataset]. https://www.nexdata.ai/datasets/llm/1572
Explore at:
Dataset updated
Aug 30, 2024
Dataset authored and provided by
Nexdata
Variables measured
Format, Content, Language, Data Size, Data Fields, Data processing, Subject categories, Question type categories
Description
This dataset contains 10.44 million English-language test questions parsed and structured for large-scale educational AI and NLP applications. Each question record includes the title, answer, explanation (parse), subject, grade level, and question type. Covering a full range of academic stages from primary and middle school to high school and university, the dataset spans core subjects such as English, mathematics, biology, and accounting. The content follows the Anglo-American education system and supports tasks such as question answering, subject knowledge enhancement, educational chatbot training, and intelligent tutoring systems. All data are formatted for efficient machine learning use and comply with data protection regulations including GDPR, CCPA, and PIPL.
Nexdata | Science Subjects Questions Text Parsing And Processing Data | 32...
data.nexdata.ai
datarade.ai
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Science Subjects Questions Text Parsing And Processing Data | 32 million [Dataset]. https://data.nexdata.ai/products/nexdata-science-subjects-questions-text-parsing-and-process-nexdata
Explore at:
Dataset updated
Nov 21, 2025
Dataset authored and provided by
Nexdata
Area covered
China
Description
32 million - Science Subjects Questions Text Parsing And Processing Data, including mathematics, physics, chemistry and biology in primary, middle and high school and university.
Nexdata | Chinese Multi-disciplinary Questions Text Parsing And Processing...
datarade.ai
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Chinese Multi-disciplinary Questions Text Parsing And Processing Data | 6.9 Million [Dataset]. https://datarade.ai/data-products/nexdata-chinese-multi-disciplinary-questions-text-parsing-a-nexdata-1cbc
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Nexdata
Area covered
China
Description
6.9 million - Chinese Multi-disciplinary Questions Text Parsing And Processing Data, including multiple disciplines in primary school, middle school, high school and university. Each questions contain title, answer, parse, type, subject, grade. The dataset can be used for large model subject knowledge enhancement tasks.

Content Multi-disciplinary questions;

Data Size About 6.9 million;

Data Fields Contains title, answer, parse, subject, grade, question type;

Subject categories The subjects for primary school, middle school, high school and university;

Format Jsonl;

Language Chinese;

Data processing Subject, questions, parse and answers were analyzed, formula conversion and table format conversion were done, and content was also cleaned
Nexdata | Chinese Multi-disciplinary Questions Text Parsing And Processing...
data.nexdata.ai
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Chinese Multi-disciplinary Questions Text Parsing And Processing Data | 6.9 Million| Large Language Model(LLM) Data [Dataset]. https://data.nexdata.ai/products/nexdata-chinese-multi-disciplinary-questions-text-parsing-a-nexdata-1cbc
Explore at:
Dataset updated
Nov 26, 2025
Dataset authored and provided by
Nexdata
Area covered
China
Description
6.9 million - Chinese Multi-disciplinary Questions Text Parsing And Processing Data, including multiple disciplines in primary school, middle school, high school and university.
Integrated Postsecondary Education Data System (IPEDS): Fall Enrollment,...
icpsr.umich.edu
ascii, sas
Updated Jul 1, 1999
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Education. National Center for Education Statistics (1999). Integrated Postsecondary Education Data System (IPEDS): Fall Enrollment, 1986 [Dataset]. http://doi.org/10.3886/ICPSR02221.v1
Explore at:
ascii, sasAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR02221.v1
Dataset updated
Jul 1, 1999
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States Department of Education. National Center for Education Statistics
License
https://www.icpsr.umich.edu/web/ICPSR/studies/2221/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/2221/terms
Area covered
United States, Marshall Islands, American Samoa, Virgin Islands of the United States, Guam
Description
The Fall Enrollment survey is conducted annually by the National Center for Education Statistics (NCES) as part of the Integrated Postsecondary Education Data System (IPEDS). The survey collects data that describe the status of student participation in various types of postsecondary institutions. The data are collected by sex for six racial/ethnic categories as defined by the Office for Civil Rights (OCR). There are two parts included in this survey. Part A, Enrollment Summary by Racial/Ethnic Status, provides enrollment data by race/ethnicity and sex and by level and year of study of the student. Part C, Clarifying Questions, supplies information on students enrolled in remedial courses, extension divisions, and branches of schools, as well as numbers of transfer students from in-state, out of state, and other countries.
SAT Questions and Answers for LLM 🏛️
kaggle.com
zip
Updated Oct 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2023). SAT Questions and Answers for LLM 🏛️ [Dataset]. https://www.kaggle.com/trainingdatapro/sat-history-questions-and-answers
Explore at:
zip(168943 bytes)Available download formats
Dataset updated
Oct 16, 2023
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
SAT History Questions and Answers 🏛️ - Text Classification Dataset

This dataset contains a collection of questions and answers for the SAT Subject Test in World History and US History. Each question is accompanied by a corresponding answers and the correct response.

The dataset includes questions from various topics, time periods, and regions on both World History and US History.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset

OTHER DATASETS FOR THE TEXT ANALYSIS:

Google Play Messengers - 6,000 Reviews ⭐️

20,000 Customers Reviews on Banks ⭐️

Amazon Reviews Dataset

Content

For each question, we extracted: - id: number of the question, - subject: SAT subject (World History or US History), - prompt: text of the question, - A: answer A, - B: answer B, - C: answer C, - D: answer D, - E: answer E, - answer: letter of the correct answer to the question

🧩 This is just an example of the data. Leave a request here to learn more

🚀 You can learn more about our high-quality unique datasets here

keywords: answer questions, sat, gpa, university, school, exam, college, web scraping, parsing, online database, text dataset, sentiment analysis, llm dataset, language modeling, large language models, text classification, text mining dataset, natural language texts, nlp, nlp open-source dataset, text data, machine learning
W
Webis-Follow-Up-Questions-24
webis.de
zenodo.org
10623105
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kiesel; Marcel Gohsen; Nailia Mirzakhmedova; Matthias Hagen; Benno Stein (2024). Webis-Follow-Up-Questions-24 [Dataset]. http://doi.org/10.5281/zenodo.10623105
Explore at:
10623105Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.10623105
Dataset updated
2024
Dataset provided by
The Web Technology & Information Systems Network
Friedrich Schiller University Jena
GESIS - Leibniz Institute for the Social Sciences
Bauhaus-Universität Weimar
Authors
Johannes Kiesel; Marcel Gohsen; Nailia Mirzakhmedova; Matthias Hagen; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Webis-Follow-Up-Questions-24 comprises 18980 simulated continuations of conversational search sessions and 3883 human judgments of such.
Data from: Answering intriguing and complex questions to people curious and...
scielo.figshare.com
jpeg
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Lang da Silveira (2023). Answering intriguing and complex questions to people curious and interested in learning physics: the site “Ask CREF” [Dataset]. http://doi.org/10.6084/m9.figshare.14326666.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14326666.v1
Dataset updated
Jun 5, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Fernando Lang da Silveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The article celebrates the ten years of the question-and-answer website developed at the Reference Center for Teaching Physics (CREF) of the Federal University of Rio Grande do Sul. These are questions and doubts, most intriguing and some of a complex nature, whose answers are hardly found in the physics literature. The description, evolution, scope and impact of the site with the Portuguese-speaking community of teachers and students are discussed. Also presented are the ten most accessed questions on the site and, as examples, six complete posts.
d
August 2024 data-update for "Updated science-wide author databases of...
elsevier.digitalcommonsdata.com
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John P.A. Ioannidis (2024). August 2024 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.7
Explore at:
Unique identifier
https://doi.org/10.17632/btchxktzyw.7
Dataset updated
Sep 16, 2024
Authors
John P.A. Ioannidis
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Nexdata | English Test Questions Text Parsing And Processing Data | 10.44...
datarade.ai
data.nexdata.ai
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | English Test Questions Text Parsing And Processing Data | 10.44 million| Large Language Model(LLM) Data [Dataset]. https://datarade.ai/data-products/nexdata-english-test-questions-text-parsing-and-processing-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Nexdata
Area covered
United States of America
Description
10.4 Million - English Test Questions Text Parsing And Processing Data, Each question contains title, answer, parse, subject, grade, question type; The educational stages cover primary, middle, high school, and university; Subjects cover mathmatics, biology, accounting, etc.The data are questions text under the Anglo-American system, which can be used to enhance the subject knowledge of large models

Content

English questions text data under Anglo-American system;

Data Size

About 10.44 million;

Data Fields

contains title, answer, parse, subject, grade, question type;

Subject categories

Subjects across primary, middle, high school, and university;

Question type categories

Multiple Choice,Single Choice,True/False,Fill in the Blanks, etc.;

Format

jsonl;

Language

English.

Data processing

Subject, questions, parse and answers were analyzed, formula conversion and table format conversion were done, and content was also cleaned
d
Central Police University Previous Exam Questions
data.gov.tw
api, csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Police University, Central Police University Previous Exam Questions [Dataset]. https://data.gov.tw/en/datasets/18799
Explore at:
api, csvAvailable download formats
Dataset authored and provided by
Central Police University
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
The entrance exam questions for the Central Police University's admissions exam.
What Makes a University Student Life "Ideal" ?
kaggle.com
zip
Updated Jan 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shivam Bansal (2019). What Makes a University Student Life "Ideal" ? [Dataset]. https://www.kaggle.com/shivamb/ideal-student-life-survey
Explore at:
zip(169882 bytes)Available download formats
Dataset updated
Jan 7, 2019
Authors
Shivam Bansal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset consists of Student Responses for the survey about Ideal Student Life and key factors that are important in student life.

About the dataset files

"survey_responses.csv" : this file contains independent variables about students and their responses to the main questions.

"survey_questions_meta.csv" : this file contains meta questions or sub-questions related to the main questions asked in the survey.

Inspiration

The dataset can be used to answer key questions such as :

What are the most important attributes for an Ideal Student Life ?

What factors leads to Stress, More Participation, More Interaction, More Satisfaction ?

Does more stress leads to less satisfaction?

What factors lead to increased stress among the students?
Data from: The thousand-question Spanish general knowledge database
figshare.com
xlsx
Updated Oct 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jon Andoni Duñabeitia (2020). The thousand-question Spanish general knowledge database [Dataset]. http://doi.org/10.6084/m9.figshare.13041803.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13041803.v2
Dataset updated
Oct 2, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jon Andoni Duñabeitia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of the norms associated with the 1364 general knowledge questions responded by 385 Spanish university students.
s
Data from: Fostering cultures of open qualitative research: Dataset 1 –...
orda.shef.ac.uk
docx
Updated Oct 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 1 – Survey Responses [Dataset]. http://doi.org/10.15131/shef.data.23567250.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.23567250.v1
Dataset updated
Oct 8, 2025
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard; Itzel San Roman Pineda
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.

The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.

ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.

This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.

The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.

The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.

The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.

The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.

A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):

· I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.

The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk

Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science

Facebook

Twitter

Click to copy link

Link copied

Cite

Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai (2020). Japanese FAQ dataset for e-learning system [Dataset]. http://doi.org/10.5281/zenodo.2783642

Data from: Japanese FAQ dataset for e-learning system

Explore at:

csv, tsv, htmlAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.2783642

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai

Description

This dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "An FAQ Dataset for E-learning System Used on a Japanese University", Data in Brief, Elsevier, in press.

This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.

We attach an English version dataset translated from the Japanese dataset to ease understanding what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.

File contents:

FAQ data (*.csv)
1. Answer2Category.csv: Categories of answers.
2. Answer2Tag.csv: Titles of answers.
3. Answers.csv: IDs for answers and texts of answers.
4. Categories.csv: Names of categories for answers.
5. Questions.csv: Texts of questions and their corresponding answer IDs.
6. Answers_english.csv: IDs for answers and texts of answers written in English.
7. Categories_english.csv: Names of categories for answers and their corresponding English names.
8. Questions_english.csv: Texts of questions and their corresponding answer IDs written in English.

Statistics (*.tsv)
Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.

Grants: JSPS KAKENHI Grant Number 18H01057

Clear search

Close search

Google apps

Main menu

Data from: Japanese FAQ dataset for e-learning system

International Students

Data from: University of Washington - Beyond High School (UW-BHS)

Nexdata | Chinese Test Question Texts from Elementary School to University...

University SET data, with faculty and courses characteristics

Q&A for Admission of Higher Education Institution

Dataset Question Answering for Admission of Higher Education Institution

Description

Categories:

Acknowledgements & Source

10.44M English Test QA Dataset – All Grades & Subjects

Nexdata | Science Subjects Questions Text Parsing And Processing Data | 32...

Nexdata | Chinese Multi-disciplinary Questions Text Parsing And Processing...

Nexdata | Chinese Multi-disciplinary Questions Text Parsing And Processing...

Integrated Postsecondary Education Data System (IPEDS): Fall Enrollment,...

SAT Questions and Answers for LLM 🏛️

SAT History Questions and Answers 🏛️ - Text Classification Dataset

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset

OTHER DATASETS FOR THE TEXT ANALYSIS:

Content

🧩 This is just an example of the data. Leave a request here to learn more

Webis-Follow-Up-Questions-24

Data from: Answering intriguing and complex questions to people curious and...

August 2024 data-update for "Updated science-wide author databases of...

Nexdata | English Test Questions Text Parsing And Processing Data | 10.44...

Central Police University Previous Exam Questions

What Makes a University Student Life "Ideal" ?

Context

About the dataset files

Inspiration

Data from: The thousand-question Spanish general knowledge database

Data from: Fostering cultures of open qualitative research: Dataset 1 –...

Data from: Japanese FAQ dataset for e-learning system