20 datasets found
  1. OCR large data set

    • kaggle.com
    zip
    Updated Feb 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Mann (2023). OCR large data set [Dataset]. https://www.kaggle.com/datasets/jame5mann/ocr-large-data-set
    Explore at:
    zip(264412 bytes)Available download formats
    Dataset updated
    Feb 15, 2023
    Authors
    James Mann
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is the large data set as featured in the OCR H240 exam series.

    Questions about this dataset will be featured in the statistics paper

    The LDS is a .xlsx file containing 5 tables, four data, one information. The data is drawn from the UK censuses from the years 2001 and 2011. It is designed for you to make comparisons and analyses of the changes in demographic and behavioural features of the populace. There is the age structure of each local authority and the method of travel within each local authority.

  2. lds-edexcel

    • kaggle.com
    zip
    Updated Apr 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Button (2020). lds-edexcel [Dataset]. https://www.kaggle.com/tombutton/edexcelldsheathrow
    Explore at:
    zip(7562 bytes)Available download formats
    Dataset updated
    Apr 21, 2020
    Authors
    Tom Button
    Description

    Dataset

    This dataset was created by Tom Button

    Contents

  3. Airoboros LLMs Math Dataset

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Airoboros LLMs Math Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/airoboros-llms-math-dataset
    Explore at:
    zip(36964941 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Airoboros LLMs Math Dataset

    Mastering Complex Mathematical Operations in Machine Learning

    By Huggingface Hub [source]

    About this dataset

    The Airoboros-3.1 dataset is the perfect tool to help machine learning models excel in the difficult realm of complicated mathematical operations. This data collection features thousands of conversations between machines and humans, formatted in ShareGPT to maximize optimization in an OS ecosystem. The dataset’s focus on advanced subjects like factorials, trigonometry, and larger numerical values will help drive machine learning models to the next level - facilitating critical acquisition of sophisticated mathematical skills that are essential for ML success. As AI technology advances at such a rapid pace, training neural networks to correspondingly move forward can be a daunting and complicated challenge - but with Airoboros-3.1’s powerful datasets designed around difficult mathematical operations it just became one step closer to achievable!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    To get started, download the dataset from Kaggle and use the train.csv file. This file contains over two thousand examples of conversations between ML models and humans which have been formatted using ShareGPT - fast and efficient OS ecosystem fine-tuning tools designed to help with understanding mathematical operations more easily. The file includes two columns: category and conversations, both of which are marked as strings in the data itself.

    Once you have downloaded the train file you can begin setting up your own ML training environment by using any of your preferred frameworks or methods. Your model should focus on predicting what kind of mathematical operations will likely be involved in future conversations by referring back to previous dialogues within this dataset for reference (category column). You can also create your own test sets from this data, adding new conversation topics either by modifying existing rows or creating new ones entirely with conversation topics related to mathematics. Finally, compare your model’s results against other established models or algorithms that are already published online!

    Happy training!

    Research Ideas

    • It can be used to build custom neural networks or machine learning algorithms that are specifically designed for complex mathematical operations.
    • This data set can be used to teach and debug more general-purpose machine learning models to recognize large numbers, and intricate calculations within natural language processing (NLP).
    • The Airoboros-3.1 dataset can also be utilized as a supervised learning task: models could learn from the conversations provided in the dataset how to respond correctly when presented with complex mathematical operations

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:------------------|:-----------------------------------------------------------------------------| | category | The type of mathematical operation being discussed. (String) | | conversations | The conversations between the machine learning model and the human. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  4. A level and other 16 to 18 results - English and Maths - below level 3...

    • explore-education-statistics.service.gov.uk
    Updated Nov 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2025). A level and other 16 to 18 results - English and Maths - below level 3 entries by student characteristics [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/53822e87-8c30-4a8d-8d8b-358b2a2083ad
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Entries and passes for below level 3 English and maths, by qualification type and student characteristics. Includes entries for students triggered for inclusion in performance tables, after discounting of exams at the major qualification level.

  5. h

    Maths-Grade-School

    • huggingface.co
    Updated Jul 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sathish Kumar R (2024). Maths-Grade-School [Dataset]. https://huggingface.co/datasets/pt-sk/Maths-Grade-School
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2024
    Authors
    Sathish Kumar R
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Maths-Grade-School I am releasing large Grade School level Mathematics datatset. This extensive dataset, comprising nearly one million instructions in JSON format, encapsulates a diverse array of topics fundamental to building a strong mathematical foundation. This dataset is in instruction format so that model developers, researchers etc. can easily use this dataset. Following Fields & sub Fields are covered: Calculus Probability Algebra Liner Algebra Trigonometry Differential Equations… See the full description on the dataset page: https://huggingface.co/datasets/pt-sk/Maths-Grade-School.

  6. n

    Data from: Exploring Human-Like Mathematical Reasoning: Perspectives on...

    • curate.nd.edu
    pdf
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenwen Liang (2024). Exploring Human-Like Mathematical Reasoning: Perspectives on Generalizability and Efficiency [Dataset]. http://doi.org/10.7274/27895872.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Zhenwen Liang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Mathematical reasoning, a fundamental aspect of human cognition, poses significant challenges for artificial intelligence (AI) systems. Despite recent advancements in natural language processing (NLP) and large language models (LLMs), AI's ability to replicate human-like reasoning, generalization, and efficiency remains an ongoing research challenge. In this dissertation, we address key limitations in MWP solving, focusing on the accuracy, generalization ability and efficiency of AI-based mathematical reasoners by applying human-like reasoning methods and principles.

    This dissertation introduces several innovative approaches in mathematical reasoning. First, a numeracy-driven framework is proposed to enhance math word problem (MWP) solvers by integrating numerical reasoning into model training, surpassing human-level performance on benchmark datasets. Second, a novel multi-solution framework captures the diversity of valid solutions to math problems, improving the generalization capabilities of AI models. Third, a customized knowledge distillation technique, termed Customized Exercise for Math Learning (CEMAL), is developed to create tailored exercises for smaller models, significantly improving their efficiency and accuracy in solving MWPs. Additionally, a multi-view fine-tuning paradigm (MinT) is introduced to enable smaller models to handle diverse annotation styles from different datasets, improving their adaptability and generalization. To further advance mathematical reasoning, a benchmark, MathChat, is introduced to evaluate large language models (LLMs) in multi-turn reasoning and instruction-following tasks, demonstrating significant performance improvements. Finally, new inference-time verifiers, Math-Rev and Code-Rev, are developed to enhance reasoning verification, combining language-based and code-based solutions for improved accuracy in both math and code reasoning tasks.

    In summary, this dissertation provides a comprehensive exploration of these challenges and contributes novel solutions that push the boundaries of AI-driven mathematical reasoning. Potential future research directions are also discussed to further extend the impact of this dissertation.

  7. h

    OlymMATH

    • huggingface.co
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RUC-AIBOX (2025). OlymMATH [Dataset]. https://huggingface.co/datasets/RUC-AIBOX/OlymMATH
    Explore at:
    Dataset updated
    Mar 28, 2025
    Dataset authored and provided by
    RUC-AIBOX
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

    This is the official huggingface repository for Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models by Haoxiang Sun, Yingqian Min, Zhipeng Chen, Wayne Xin Zhao, Zheng Liu, Zhongyuan Wang, Lei Fang, and Ji-Rong Wen. We have also released the OlymMATH-eval dataset on HuggingFace 🤗, together with a data visualization tool OlymMATH-demo… See the full description on the dataset page: https://huggingface.co/datasets/RUC-AIBOX/OlymMATH.

  8. Graduates at doctoral level, in science, math., computing, engineering,...

    • ec.europa.eu
    Updated Oct 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2025). Graduates at doctoral level, in science, math., computing, engineering, manufacturing, construction, by sex - per 1000 of population aged 25-34 [Dataset]. http://doi.org/10.2908/EDUC_UOE_GRAD07
    Explore at:
    application/vnd.sdmx.genericdata+xml;version=2.1, application/vnd.sdmx.data+csv;version=2.0.0, json, tsv, application/vnd.sdmx.data+xml;version=3.0.0, application/vnd.sdmx.data+csv;version=1.0.0Available download formats
    Dataset updated
    Oct 10, 2025
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2013 - 2023
    Area covered
    Albania, Portugal, Slovenia, Czechia, Luxembourg, Türkiye, Finland, Italy, Croatia, Montenegro
    Description

    This domain covers statistics and indicators on key aspects of the education systems across Europe. The data show entrants and enrolments in education levels, education personnel and the cost and type of resources dedicated to education.

    For a general technical description of the UOE Data Collection see UNESCO OECD Eurostat (UOE) joint data collection – methodology - Statistics Explained (europa.eu).

    The standards on international statistics on education and training systems are set by the three international organisations jointly administering the annual UOE data collection:

    • The United Nations Educational, Scientific, and Cultural Organisation Institute for Statistics (UNESCO-UIS),
    • The Organisation for Economic Co-operation and Development (OECD) and,
    • The Statistical Office of the European Union (EUROSTAT).

    The following topics are covered:

    • Pupils and students – Enrolments and Entrants,
    • Learning mobility,
    • Education personnel,
    • Education finance,
    • Graduates,
    • Language learning.

    Data on enrolments in education are disseminated in absolute numbers, with breakdowns available for the following dimensions:

    • ISCED level of education,
    • Sex,
    • Age or age group,
    • NUTS1 and NUTS2 regions,
    • Type of educational institution (public or private) – referred to as the ‘sector’ in Eurobase,
    • Intensity of participation (full-time, part-time, full-time equivalent) – referred to as ‘working time’ in Eurobase,
    • Programme orientation (general/academic or vocational/professional),
    • Type of vocational programme (school-based only or combined school and work-based),
    • Level of attainment that can be achieved upon programme completion (e.g. insufficient for level completion or partial level completion, sufficient for partial level completion without direct access to tertiary education),
    • Field of education (ISCED-F13).

    Additionally, the following types of indicators on enrolments are calculated (all indicators using population data use Eurostat’s population database (demo_pjan)):

    • Participation rates by age or by age groups as % of corresponding age population.
    • Participation rates by age as % of total population.
    • Pupils from age 0, 3, 4 and 5 to the starting age of compulsory education at primary level, as % of the population of the corresponding age. In some countries, the start of primary education is not compulsory and in some countries compulsory education starts at pre-primary level. This indicator calculates the participation rates of pupils up until (but not including) the starting age of formal education that is both compulsory and at the primary level. This age varies from 5 years to 7 years across countries and the national starting ages for compulsory primary education used in the calculation of this indicator are listed in the file Ages_educ_indicators which is available to download in the Annexes section of this page.
    • Pupils under the age of 3 as % of corresponding age population. This indicator does not include 3 year olds (includes ages 0, 1 and 2).
    • Out-of-school rates at different ages. This indicator is calculated as 100 – (students of a particular age who are enrolled in education at any ISCED level / Total population of that age *100).
      • Out-of-school rates in population of lower secondary school age and in population of upper secondary school age. This indicator is calculated as 100 – (students who are of the official age range for ISCED X who are enrolled in education at any ISCED level / Total population in the official age range for ISCED X *100). The official age range for each ISCED level varies across countries, and national age ranges for lower and upper secondary used in the calculation of this indicator are listed in the file Ages_educ_indicators which is available to download in the Annexes section of this page.
      • Students in education of post-compulsory school age - as % of the total population of post-compulsory school age. The final age at which formal education is considered as compulsory in national education systems in the calculation of this indicator are listed in the file Ages_educ_indicators.
      • Students participation at the end of compulsory education - as % of the corresponding age population. Indicator is calculated for age (X-1), (X), (X+1), (X+2) where X = the final age at which formal education is compulsory in national education systems. The final age at which formal education is considered as compulsory in national education systems in the calculation of this indicator are listed in the file Ages_educ_indicators.
      • Students in education aged 30 and over - per 1000 of corresponding age population
        • Expected school years of pupils and students at different levels of education
        • Distribution of pupils and students enrolled in general and vocational programmes by education level and NUTS2 regions
        • Distribution of students in different fields of education
        • Ratio of the proportion of the population who are tertiary students in NUTS1 regions to the proportion of the population who are tertiary students in NUTS2 regions

    Data on entrants in education are disseminated in absolute numbers, with breakdowns available for the following dimensions:

    • ISCED level of education,
    • Programme orientation (general/academic or vocational/professional),
    • Sex,
    • Age or age group,
    • Field of education (ISCED-F13).

    Additionally the following indicator on entrants is calculated:

    • Distribution of new entrants in different fields of education.

    Data on learning mobility is available for degree mobile students, degree mobile graduates and credit mobile graduates. Degree mobility means that students/graduates are/were enrolled as regular students in any semester/term of a programme taught in the country of destination with the intention of graduating from it in the country of destination. Credit mobility is defined as temporary tertiary education or/and study-related traineeship abroad within the framework of enrolment in a tertiary education programme at a "home institution" (usually) for the purpose of gaining academic credit (i.e. credit that will be recognised in that home institution). Further definitions are in Section 2.8 of the UOE manual.

    Degree mobile students are referred to as just ‘mobile students’ in UOE learning mobility tables. Data is disseminated for degree mobile students and degree mobile graduates in absolute numbers with breakdowns available for the following dimensions:

    • ISCED level of education,
    • Sex,
    • Field of education (ISCED-F13),
    • Country of origin (defined as the country of education prior to entering tertiary although there may be national deviations. These are listed in the Helpsheet of the latest footnotes report available to download in the Annexes section of this page) – referred to as ‘Geopolitical entity (partner)’ in Eurobase.

    Additionally the following types of indicators on degree mobile students and degree mobile graduates are calculated ((all indicators using population data use Eurostat’s population database (demo_pjan)):

    • Share of all students/graduates who are mobile students/degree mobile graduates from abroad,
    • Distribution of mobile students/degree mobile graduates from abroad in different fields of education.

    For credit mobile graduates, data are disseminated in absolute numbers, with breakdowns available for the following dimensions:

    • ISCED level of education,
    • Sex,
    • Type of mobility scheme (e.g. Credit mobility under EU programmes i.e. ERASMUS, Credit mobility in other international/national programmes),
    • Type of mobility (study period only or study period combined with work placement),
    • Country of destination – referred to as ‘Geopolitical entity (partner)’ in Eurobase.

    Data on personnel in education are available for classroom teachers/academic staff, teacher aides and school-management personnel. Teachers are employed in a professional capacity to guide and direct the learning experiences of students, irrespective of their training, qualifications or delivery mechanism. Teacher aides support teachers in providing instruction to students. Academic staff are personnel employed at the tertiary level of education whose primary assignment is instruction and/or research. School management personnel covers professional personnel who are responsible for school management/administration (ISCED 0-4) or whose primary or major responsibility is the management of the institution, or a recognised department or subdivision of the institution (tertiary levels). Full definitions of these statistical units are in Section 3.5 of the UOE manual.

    Data are disseminated on teachers and academic staff in absolute numbers, with breakdowns available for the following dimensions:

    • ISCED

  9. h

    Lean-Workbook

    • huggingface.co
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intern Large Models (2024). Lean-Workbook [Dataset]. http://doi.org/10.57967/hf/2399
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 11, 2024
    Dataset authored and provided by
    Intern Large Models
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Lean Workbook

    This dataset is about contest-level math problems formalized in Lean 4. Our dataset contains 57231 problems in the split of Lean Workbook and 82893 problems in the split of Lean Workbook Plus. We provide the natural language statement, answer, formal statement, and formal proof (if available) for each problem. These data can support autoformalization model training and searching for proofs. We open-source our code and our data. Our test environment is based on Lean… See the full description on the dataset page: https://huggingface.co/datasets/internlm/Lean-Workbook.

  10. o

    School information and student demographics

    • data.ontario.ca
    • datasets.ai
    • +1more
    xlsx
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education (2025). School information and student demographics [Dataset]. https://data.ontario.ca/dataset/school-information-and-student-demographics
    Explore at:
    xlsx(1510697), xlsx(1529849), xlsx(1565910), xlsx(1550796), xlsx(1566878), xlsx(1565304), xlsx(1562805), xlsx(1459001), xlsx(1462006), xlsx(1460629), xlsx(1547704), xlsx(1567330), xlsx(1580734), xlsx(1462064)Available download formats
    Dataset updated
    Oct 23, 2025
    Dataset authored and provided by
    Education
    License

    https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario

    Time period covered
    Oct 23, 2025
    Area covered
    Ontario
    Description

    Data includes: board and school information, grade 3 and 6 EQAO student achievements for reading, writing and mathematics, and grade 9 mathematics EQAO and OSSLT. Data excludes private schools, Education and Community Partnership Programs (ECPP), summer, night and continuing education schools.

    How Are We Protecting Privacy?

    Results for OnSIS and Statistics Canada variables are suppressed based on school population size to better protect student privacy. In order to achieve this additional level of protection, the Ministry has used a methodology that randomly rounds a percentage either up or down depending on school enrolment. In order to protect privacy, the ministry does not publicly report on data when there are fewer than 10 individuals represented.

      * Percentages depicted as 0 may not always be 0 values as in certain situations the values have been randomly rounded down or there are no reported results at a school for the respective indicator. * Percentages depicted as 100 are not always 100, in certain situations the values have been randomly rounded up.
    The school enrolment totals have been rounded to the nearest 5 in order to better protect and maintain student privacy.

    The information in the School Information Finder is the most current available to the Ministry of Education at this time, as reported by schools, school boards, EQAO and Statistics Canada. The information is updated as frequently as possible.

    This information is also available on the Ministry of Education's School Information Finder website by individual school.

    Descriptions for some of the data types can be found in our glossary.

    School/school board and school authority contact information are updated and maintained by school boards and may not be the most current version. For the most recent information please visit: https://data.ontario.ca/dataset/ontario-public-school-contact-information.

  11. Student Performance Data Set

    • kaggle.com
    zip
    Updated Mar 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    zip(12353 bytes)Available download formats
    Dataset updated
    Mar 27, 2020
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  12. Vocational and other qualifications quarterly: April to June 2017

    • gov.uk
    Updated Sep 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ofqual (2017). Vocational and other qualifications quarterly: April to June 2017 [Dataset]. https://www.gov.uk/government/statistics/vocational-and-other-qualifications-quarterly-april-to-june-2017
    Explore at:
    Dataset updated
    Sep 14, 2017
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Ofqual
    Description

    Main trends for quarter 2 2017

    1. Just over 1.5 million certificates were awarded in 2017 Q2, a decrease of 5% on the same quarter of 2016.
    2. The decline is mostly due to a decrease in the number of certificates in key skills, QCF, other general qualifications and functional skills.
    3. Functional skills qualifications continue to replace key skills qualifications leading to a reduction in the latter.
    4. The general decline in overall certification numbers may be caused by a tightening in the availability of funding; this is notable at entry level, level 1, level 2 and level 3 qualifications.
    5. The largest increase in number of certificates (123%) seen in vocationally-related qualifications is likely due to the removal of the QCF.
    6. The sector subject areas with notable increase in number of certificates were health, public services and care, arts, media and publishing and construction, planning and the built environment.
    7. The sector subject areas with notable decrease in number of certificates were preparation for life and work, retail and commercial enterprise and business, administration and law.
    8. The qualification with the highest number of certificates this quarter was BCS Level 2 ECDL Certificate in IT Application Skills, followed by QA Level 2 Award in Emergency First Aid at Work (QCF) and Pearson Edexcel Functional Skills qualification in Mathematics at Level 1.
    9. The drop in certificates in Wales this quarter compared to the same quarter previous year was largely due to reduction in number of certificates in key skills and other general qualifications.
    10. The drop in certificates in Northern Ireland this quarter compared to the same quarter previous year was largely due to reduction in number of certificates in QCF and other general qualifications.

    Datasets

    The datasets used to produce this release are available for England, Wales and Northern Ireland.

    User feedback

    We welcome your feedback on our publications. Should you have any comments on this statistical release and how to improve it to meet your needs please contact us as statistics@ofqual.gov.uk.

  13. Vocational and other qualifications quarterly: July to September 2017

    • gov.uk
    • tnaqa.mirrorweb.com
    Updated Dec 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ofqual (2017). Vocational and other qualifications quarterly: July to September 2017 [Dataset]. https://www.gov.uk/government/statistics/vocational-and-other-qualifications-quarterly-july-to-september-2017
    Explore at:
    Dataset updated
    Dec 7, 2017
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Ofqual
    Description

    Trends

    Nearly 2.8 million certificates were awarded in between July and September 2017, a decrease of 15.5% on the same period of 2016. The decline is mostly due to a decrease in the number of certificates in the qualifications and credit framework (QCF) and other general qualifications. There are also large decreases in the number of certificates in functional skills, key skills, free-standing maths and entry level. The general decline in overall certification numbers may be caused by a tightening in the availability of funding. This is notable at entry level, level 1, level 2 and level 1/2 qualifications.

    Functional skills qualifications continue to replace key skills qualifications leading to a reduction in the number of certificates in the latter.

    The reduction in the ‘other general qualifications’ may be an effect of the introduction of the English Baccalaureate and other school performance indicators. For example, the calculation of Progress 8 and Attainment 8 measures can only include a maximum of 3 non-English Baccalaureate qualifications.

    The largest increase in number of certificates (59.1%) was seen in vocationally-related qualifications. This is likely caused by awarding organisations re-assigning the qualification type of QCF qualifications to vocationally-related qualification. Following the closure of the QCF unit bank and introduction of the regulated qualifications framework (RQF), Ofqual decided that inclusion of the term ‘QCF’ in qualification titles after 31 December 2017 would be an indicator of non-compliance with Ofqual’s titling rules. As well as amending qualification titles, awarding organisations are therefore likely to be re-assigning the qualification type. A concession to the inclusion of the term “QCF” has been given to applied general qualifications that have similar titles but differing assessment (pre-existing and newly introduced with 40% assessment) allowing differentiation between them.

    The sector subject area with notable increase in number of certificates was construction, planning and the built environment.

    The sector subject areas with notable decrease in number of certificates were languages, literature and culture, preparation for life and work, information and communication technology, and science and mathematics.

    The qualification with the highest number of certificates this quarter was ‘BCS Level 2 ECDL Certificate in IT Application Skills’, followed by ‘Pearson BTEC Level 1/Level 2 First Award in Sport’ and ‘WJEC Foundation/National Skills Challenge Certificate (Welsh Baccalaureate)’.

    Datasets

    The datasets used to produce this release are available for England, Wales and Northern Ireland.

    Glossary

    Definitions for some of the specific terms used in our statistical bulletins are explained in the ‘Glossary for Ofqual’s statistics’.

    User feedback

    We welcome your feedback on our publications. Should you have any comments on this statistical release and how to improve it to meet your needs please contact us as statistics@ofqual.gov.uk.

  14. GSM8K - Grade School Math 8K dataset for LLM

    • kaggle.com
    zip
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johnson chong (2024). GSM8K - Grade School Math 8K dataset for LLM [Dataset]. https://www.kaggle.com/datasets/johnsonhk88/gsm8k-grade-school-math-8k-dataset-for-llm
    Explore at:
    zip(5156809 bytes)Available download formats
    Dataset updated
    May 21, 2024
    Authors
    Johnson chong
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

    These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the final answer. A bright middle school student should be able to solve every problem: from the paper, "Problems require no concepts beyond the level of early Algebra, and the vast majority of problems can be solved without explicitly defining a variable." Solutions are provided in natural language, as opposed to pure math expressions. From the paper: "We believe this is the most generally useful data format, and we expect it to shed light on the properties of large language models’ internal monologues"

  15. Vocational and other qualifications quarterly: April to June 2018

    • gov.uk
    • tnaqa.mirrorweb.com
    Updated Sep 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ofqual (2018). Vocational and other qualifications quarterly: April to June 2018 [Dataset]. https://www.gov.uk/government/statistics/vocational-and-other-qualifications-quarterly-april-to-june-2018
    Explore at:
    Dataset updated
    Sep 20, 2018
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Ofqual
    Description

    Main trends for quarter 2 2018

    1. 1.5 million certificates were awarded in 2018 Q2, an increase of 3.9% on the same quarter of 2017.
    2. The rise is mostly due to an increase in the number of certificates in Vocationally-Related Qualifications. This increase has been offset by the large decrease in the number of certificates in QCF Qualifications.
    3. The general declines in the number of certificates over 12 month periods may be caused by a tightening in the availability of funding. This is notable for Level 2 qualifications. Some of this decline has been offset by the large increase in the number of certificates in Level 3 qualifications. This change could be driven by changes in performance tables as Applied General qualifications (Level 3) grow in popularity.
    4. The decline in the number of certificates in Functional Skills is likely due to the changes in funding rules by the Education and Skills Funding Agency and revised guidance from DfE that post 16 students who have a grade D/grade 3 in English or maths must now be entered for GCSE resits rather than Functional Skills. In addition, colleges are also incentivised to enter students with grade E/grade 2 for GCSE English and maths as they gain more credit for distance travelled by improving a GCSE grade than for Functional Skills attainment.
    5. Large increases in the number of certificates were seen in Occupational Qualifications (164%) and Vocationally-Related Qualifications (121%). This is likely caused by awarding organisations re-assigning the qualification type of QCF qualifications to Occupational Qualifications or Vocationally-Related Qualification. Following the closure of the QCF unit bank and introduction of the Regulated Qualifications Framework (RQF), Ofqual decided that inclusion of the term ‘QCF’ in qualification titles after 31 December 2017 would be an indicator of non-compliance with Ofqual’s titling rules. As well as amending qualification titles, awarding organisations are therefore likely to be re-assigning the qualification type. A concession to the inclusion of the term “QCF” has been given to Applied General qualifications that have similar titles but differing assessment (pre-existing and newly introduced with 40% assessment) allowing differentiation between them.
    6. The sector subject area with a notable increase in the number of certificates was Leisure, Travel and Tourism.
    7. The sector subject areas with notable decreases in the number of certificates were Information and Communication Technology; Languages, Literature and Culture; and, Retail and Commercial Enterprise.
    8. The decline in the number of certificates in Information and Communication Technology is due to the sharp drop in the number of certificates in BCS Level 2 ECDL Certificate in IT Application Skills compared to Q2 in 2017, most likely due to its removal from performance tables. The decrease in certificates for this subject area is mitigated by an increase in certificates to several qualifications offered by The Learning Machine which have been included in performance tables.
    9. The qualification with the highest number of certificates this quarter was QA Level 3 Award in Emergency First Aid at Work (RQF), followed by Pearson BTEC Level 1/Level 2 First Award in Sport and TCL Entry Level Certificate in ESOL International - Speaking and Listening (Entry 3).
    10. A number of qualifications have high numbers of certificates this quarter, including new level 3 qualifications, which were not available to certificate in the same quarter in the previous year. Many of these are first aid qualifications. This is due to the First Aid Awarding Organisations Forum (FAAOF) review of the Emergency First Aid at Work qualification and that these should be re-levelled from Level 2 to Level 3 in England.

    Geographical coverage

    The data cover regulated qualifications in England.

    Datasets

    The dataset used to produce this release are available separately.

    Statistics collection

    All our published vocational and other qualifications publications are available at a single collection page.

    User feedback

    We welcome your feedback on our publications. Should you have any comments on this statistical release and how to improve it to meet your needs please contact us at statistics@ofqual.gov.uk.

  16. i

    Southern and Eastern Africa Consortium for Monitoring Educational Quality...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southern and Eastern Africa Consortium for Monitoring Educational Quality (2019). Southern and Eastern Africa Consortium for Monitoring Educational Quality 2000 - South Africa [Dataset]. https://datacatalog.ihsn.org/catalog/4716
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Southern and Eastern Africa Consortium for Monitoring Educational Quality
    Time period covered
    2000
    Area covered
    South Africa
    Description

    Abstract

    In 1991 the International Institute for Educational Planning (IIEP) and a number of Ministries of Education in Southern and Eastern Africa began to work together in order to address training and research needs in Education. The focus for this work was on establishing long-term strategies for building the capacity of educational planners to monitor and evaluate the quality of their basic education systems. The first two educational policy research projects undertaken by SACMEQ (widely known as "SACMEQ I" and "SACMEQ II") were designed to provide detailed information that could be used to guide planning decisions aimed at improving the quality of education in primary school systems.

    During 1995-1998 seven Ministries of Education participated in the SACMEQ I Project. The SACMEQ II Project commenced in 1998 and the surveys of schools, involving 14 Ministries of Education, took place between 2000 and 2004. The survey was undertaken in schools in Botswana, Kenya, Lesotho, Malawi, Mauritius, Mozambique, Namibia, Seychelles, South Africa, Swaziland, Tanzania, Uganda, Zambia and Zanzibar.

    Moving from the SACMEQ I Project (covering around 1100 schools and 20,000 pupils) to the SACMEQ II Project (covering around 2500 schools and 45,000 pupils) resulted in a major increase in the scale and complexity of SACMEQ's research and training programmes.

    SACMEQ's mission is to: a) Expand opportunities for educational planners to gain the technical skills required to monitor and evaluate the quality of their education systems; and b) Generate information that can be used by decision-makers to plan and improve the quality of education.

    Geographic coverage

    National coverage

    Analysis unit

    • Pupils
    • Teachers
    • Schools

    Universe

    The target population for SACMEQ's Initial Project was defined as "all pupils at the Grade 6 level in 1995 who were attending registered government or non-government schools". Grade 6 was chosen because it was the grade level where the basics of reading literacy were expected to have been acquired.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample designs used in the SACMEQ II Project were selected so as to meet the standards set down by the International Association for the Evaluation of Educational Achievement. These standards required that sample estimates of important pupil population parameters should have sampling accuracy that was at least equivalent to a simple random sample of 400 pupils (thereby guaranteeing 95 percent confidence limits for sample means of plus or minus one tenth of a pupil standard deviation unit).

    Some Constraints on Sample Design Sample designs in the field of education are usually prepared amid a network of competing constraints. These designs need to adhere to established survey sampling theory and, at the same time, give due recognition to the financial, administrative, and socio-political settings in which they are to be applied. The "best" sample design for a particular project is one that provides levels of sampling accuracy that are acceptable in terms of the main aims of the project, while simultaneously limiting cost, logistic, and procedural demands to manageable levels. The major constraints that were established prior to the preparation of the sample designs for the SACMEQ II Project have been listed below.

    Target Population: The target population definitions should focus on Grade 6 pupils attending registered mainstream government or non-government schools. In addition, the defined target population should be constructed by excluding no more than 5 percent of pupils from the desired target population.

    Bias Control: The sampling should conform to the accepted rules of scientific probability sampling. That is, the members of the defined target population should have a known and non-zero probability of selection into the sample so that any potential for bias in sample estimates due to variations from "epsem sampling" (equal probability of selection method) may be addressed through the use of appropriate sampling weights (Kish, 1965).

    Sampling Errors: The sample estimates for the main criterion variables should conform to the sampling accuracy requirements set down by the International Association for the Evaluation of Educational Achievement (Ross, 1991). That is, the standard error of sampling for the pupil tests should be of a magnitude that is equal to, or smaller than, what would be achieved by employing a simple random sample of 400 pupils (Ross, 1985).

    Response Rates: Each SACMEQ country should aim to achieve an overall response rate for pupils of 80 percent. This figure was based on the wish to achieve or exceed a response rate of 90 percent for schools and a response rate of 90 percent for pupils within schools.

    Administrative and Financial Costs: The number of schools selected in each country should recognize limitations in the administrative and financial resources available for data collection.

    Other Constraints: The number of pupils selected to participate in the data collection in each selected school should be set at a level that will maximize validity of the within-school data collection for the pupil reading and mathematics tests.

    Note: Detailed descriptions of the sample design, sample selection, and sample evaluation procedures have been presented in the "South Africa Working Report".

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The data collection for SACMEQ’s Initial Project took place in October 1995 and involved the administration of questionnaires to pupils, teachers, and school heads. The pupil questionnaire contained questions about the pupils’ home backgrounds and their school life; the teacher questionnaire asked about classrooms, teaching practices, working conditions, and teacher housing; and the school head questionnaire collected information about teachers, enrolments, buildings, facilities, and management. A reading literacy test was also given to the pupils. The test was based on items that were selected after a trial-testing programme had been completed.

    Cleaning operations

    Data Checking and Data Entry Data preparation commenced soon after the main data collection was completed. The NRCs had to organize the safe return of all materials to the Ministry of Education where the data collection instruments could be checked, entered into computers, and then "cleaned" to remove errors prior to data analysis. The data-checking involved the "hand editing" of data collection instruments by a team of trained staff. They were required to check that: (i) all questionnaires, tests, and forms had arrived back from the sample schools, (ii) the identification numbers on all instruments were complete and accurate, and (iii) certain logical linkages between questions made sense (for example, the two questions to school heads concerning "Do you have a school library?" and "How many books do you have in your school library?").

    The next step was the entry of data into computers using the WINDEM software. A team of 5-10 staff normally undertook this work. In some cases the data were "double entered" in order to monitor accuracy.

    The numbers of keystrokes required to enter one copy of each data collection instrument were as follows: pupil questionnaire: 150; pupil reading test: 85; pupil mathematics test: 65; teacher questionnaire: 587; teacher reading test: 51; teacher mathematics test: 43; school head questionnaire: 319; school form: 58; and pupil name form: 51.

    This information can be re-expressed to give the total number of keystrokes for the whole body of data for one country by multiplying the above figures by the number of instruments in the final data collection. In the case of South Africa the total number of keystrokes was as follows: pupil questionnaire: 472 450; pupil reading test: 269 855; pupil mathematics test: 205 595; teacher questionnaire: 198 406; school head questionnaire: 62 361; school form: 9 802; and pupil name form: 161 313. That is, a total of 907 332 keystrokes were required to enter all of the data for South Africa.

    An experienced keyboard operator can work at a rate of 25 keystrokes per minute (working from multi-paged questionnaires and stopping occasionally to clarify individual questionnaire entries with the supervisor). Assuming that this kind of work rate could be sustained for, say, around a maximum of six hours per day, then the whole data entry operation for South Africa was estimated to amount to around 101 person days of data entry work for South Africa.

    Data Cleaning The NRCs received written instructions and follow-up support from IIEP staff in the basic steps of data cleaning using the WINDEM software. This permitted the NRCs to (i) identify major errors in the sequence of identification numbers, (ii) cross-check identification numbers across files (for example, to ensure that all pupils were linked with their own reading and mathematics teachers), (iii) ensure that all schools listed on the original sampling frame also had valid data collection instruments and vice-versa, (iv) check for "wild codes" that occurred when some variables had values that fell outside pre-specified reasonable limits, and (v) validate that variables used as linkage devices in later file merges were available and accurate.

    A second phase of data preparation directed efforts towards the identification and correction of "wild codes" (which refer to data values that that fall outside credible limits), and "inconsistencies" (which refer to different responses to the same, or related, questions). There were also some errors in the identification codes for teachers

  17. u

    Trends in International Mathematics and Science Study 2007 - International

    • datafirst.uct.ac.za
    Updated May 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIMSS & PIRLS International Study Center (2020). Trends in International Mathematics and Science Study 2007 - International [Dataset]. http://www.datafirst.uct.ac.za/Dataportal/index.php/catalog/173
    Explore at:
    Dataset updated
    May 25, 2020
    Dataset authored and provided by
    TIMSS & PIRLS International Study Center
    Time period covered
    2007
    Area covered
    International
    Description

    Abstract

    TIMSS 2007 is the fourth in a cycle of internationally comparative assessments dedicated to improving teaching and learning in mathematics and science for students around the world. Carried out every four years at the fourth and eighth grades, TIMSS provides data about trends in mathematics and science achievement over time.

    To inform educational policy in the participating countries, this world-wide assessment and research project also routinely collects extensive background information that addresses concerns about the quantity, quality, and content of instruction.

    Geographic coverage

    The survey had international coverage

    Analysis unit

    Individuals and institutions

    Universe

    TIMSS 2007 chose to study achievement in two target populations-the fourth and eighth grade in most countries. Participating countries were free to select either population or both. The formal definitions of the TIMSS target populations make use of UNESCO's International Standard Classification of Education (ISCED) (UNESCO Institute for Statistics, 1999) in identifying the appropriate target grades: - Fourth grade population. This includes all students enrolled in the grade that represents 4 years of formal schooling, counting from the first year of ISCED Level 1, provided that the mean age at the time of testing is at least 9.5 years. For most countries, the target grade should be the fourth grade or its national equivalent.

    • Eighth grade population. This includes all students enrolled in the grade that represents 8 years of formal schooling, counting from the first year of ISCED Level 1, provided that the mean age at the time of testing is at least 13.5 years. For most countries, the target grade should be the eighth grade or its national equivalent.

    Kind of data

    Sample survey data

    Sampling procedure

    A systematic, two-stage probability proportional-to-size (PPS) sampling technique was used, where schools are first sampled and then classes within sampled (and participating) schools. . Because of its large population sizes, it was necessary to include a preliminary sampling stage in the Russian Federation, where regions were sampled first and then schools. Singapore also had a third sampling stage, where students were sampled within classes

    Sampling deviation

    Participants could exclude schools from the sampling frame if they were in geographically remote regions, were extremely small, offered curriculum or structure different from the mainstream, or provided instruction only to students in the “within-school” exclusion categories. The general TIMSS rules for defining within-school exclusions can be found in the technical documents.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The study used the following questionnaires: Fourth Grade Student Questionnaire, Fourth Grade Teacher Questionnaire, Fourth Grade School Questionnaire, Eighth Grade Student Questionnaire, Eighth Grade Mathematics Teacher Questionnaire, Eighth Grade Science Teacher Questionnaire, and Eighth Grade School Questionnaire. Information on the variables obtained or derived from questions in the survey is available in the TIMSS 2007 user guide for the international database: Data Supplement3: Variables derived from the Student, Teacher, and School Questionnaire data.

    Response rate

    Weighted and unweighted response rates were computed for each participating country by grade, at the school level, and at the student level. Overall response rates (combined school and student response rates) also were computed.

  18. i

    Southern and Eastern Africa Consortium for Monitoring Educational Quality...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southern and Eastern Africa Consortium for Monitoring Educational Quality (2019). Southern and Eastern Africa Consortium for Monitoring Educational Quality 2000 - Namibia [Dataset]. https://datacatalog.ihsn.org/catalog/4713
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Southern and Eastern Africa Consortium for Monitoring Educational Quality
    Time period covered
    2000
    Area covered
    Namibia
    Description

    Abstract

    In 1991 the International Institute for Educational Planning (IIEP) and a number of Ministries of Education in Southern and Eastern Africa began to work together in order to address training and research needs in Education. The focus for this work was on establishing long-term strategies for building the capacity of educational planners to monitor and evaluate the quality of their basic education systems. The first two educational policy research projects undertaken by SACMEQ (widely known as "SACMEQ I" and "SACMEQ II") were designed to provide detailed information that could be used to guide planning decisions aimed at improving the quality of education in primary school systems.

    During 1995-1998 seven Ministries of Education participated in the SACMEQ I Project. The SACMEQ II Project commenced in 1998 and the surveys of schools, involving 14 Ministries of Education, took place between 2000 and 2004. The survey was undertaken in schools in Botswana, Kenya, Lesotho, Malawi, Mauritius, Mozambique, Namibia, Seychelles, South Africa, Swaziland, Tanzania, Uganda, Zambia and Zanzibar.

    Moving from the SACMEQ I Project (covering around 1100 schools and 20,000 pupils) to the SACMEQ II Project (covering around 2500 schools and 45,000 pupils) resulted in a major increase in the scale and complexity of SACMEQ's research and training programmes.

    SACMEQ's mission is to: a) Expand opportunities for educational planners to gain the technical skills required to monitor and evaluate the quality of their education systems; and b) Generate information that can be used by decision-makers to plan and improve the quality of education.

    Geographic coverage

    National coverage

    Analysis unit

    • Pupils
    • Teachers
    • Schools

    Universe

    The target population for SACMEQ's Initial Project was defined as "all pupils at the Grade 6 level in 1995 who were attending registered government or non-government schools". Grade 6 was chosen because it was the grade level where the basics of reading literacy were expected to have been acquired.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sampling The "best" sample design for a particular project is one that provides levels of sampling accuracy that are acceptable in terms of the main aims of the project, while simultaneously limiting cost, logistic, and procedural demands to manageable levels. The major constraints that were established prior to the preparation of the sample designs for the SACMEQ II Project have been listed below.

    Target Population: The target population definitions should focus on Grade 6 pupils attending registered mainstream government or non-government schools. In addition, the defined target population should be constructed by excluding no more than 5 percent of pupils from the desired target population.

    Bias Control: The sampling should conform to the accepted rules of scientific probability sampling. That is, the members of the defined target population should have a known and non-zero probability of selection into the sample so that any potential for bias in sample estimates due to variations from "epsem sampling" (equal probability of selection method) could be addressed through the use of appropriate sampling weights.

    Sampling Errors: The sample estimates for the main criterion variables should conform to the sampling accuracy requirements that the standard error of sampling for the pupil tests should be of a magnitude that is equal to, or smaller than, what would be achieved by The Specification of the Target Population employing a simple random sample of 400 pupils.

    Response Rates: Each SACMEQ country should aim to achieve an overall response rate for pupils of 80 percent. This figure was based on the wish to achieve or exceed a response rate of 90 percent for schools and a response rate of 90 percent for pupils within schools.

    Administrative and Financial Costs: The number of schools selected in each country should recognise limitations in the administrative and financial resources available for data collection.

    Other Constraints: The number of learners selected to participate in the data collection in each selected school should be set at a level that will maximise validity of the within-school data collection for the learner reading and mathematics tests.

    For Namibia, the desired target population was all learners enrolled in Grade 6 in the ninth month of the school year (i.e. in September 2000). The net enrolment ratio for the age group 7-13 years old who were enrolled in Grades 1 to 7 in Namibia in 2000 was 91.3 percent. However, in Namibia it was decided to exclude certain learners. These were learners in schools having fewer than 15 Grade 6 learners in them, learners in 'inaccessible schools, and learners in special schools. In all 884 learners from 82 schools were excluded but this only amounted to 1.8 percent of all learners. In Namibia there were 849 primary schools having 48,567 learners. After excluding the 1.8 percent of learners the defined population from which a sample had to be drawn consisted of 47,683 learners from 767 schools.

    The number of schools required in the sample is in part a function of the intra-class correlation (rho) which is an indicator of the proportion of variation (in achievement in this case) among schools of total variation. The following is the formula often used for estimating the value of rho in situations where two-stage cluster sampling is employed using (approximately) equal sized clusters.

    estimated rho = (b. s(a)square - (s)square) / (b - 1)(s)square

    where s(a)square is the variance of cluster means, (s)square is the variance of the element values, and b is the cluster size. In SACMEQ I the rho had been 0.60 in Namibia. That is 60 percent of the variation was among schools and only 40 percent within schools. Therefore, in the case of Namibia a rho of 0.60 was used. This meant drawing a sample of 248 schools.

    The major aim of the sampling was to have the equivalent of a simple random sample of 400 learners. In Namibia, this was 767 for reading achievement and 810 for mathematics. Hence the sample was a very good one for Namibia. For SACMEQ I it had been 335 which was below the required 400. This was because SACMEQ I was the first sample survey in Namibia and at that time it was assumed that the rho was 0.30. It was not. In SACMEQ II the rhos were 0.60 for reading and 0.53 for mathematics. Thus, in 2000 the variation among schools was slightly lower than in 1995.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The data collection for SACMEQ’s Initial Project took place in October 1995 and involved the administration of questionnaires to pupils, teachers, and school heads. The pupil questionnaire contained questions about the pupils’ home backgrounds and their school life; the teacher questionnaire asked about classrooms, teaching practices, working conditions, and teacher housing; and the school head questionnaire collected information about teachers, enrolments, buildings, facilities, and management. A reading literacy test was also given to the pupils. The test was based on items that were selected after a trial-testing programme had been completed.

    Cleaning operations

    Data entry and data cleaning A team of five persons from the University of Namibia Multi-Disciplinary Research Centre computer lab was appointed and trained in the use of WINDEM, a special data entry package to be used in SACMEQ. The numbers of keystrokes required to enter one copy of each data collection instrument were as follows: learner questionnaire: 150; learner reading test: 85; learner mathematics test: 65; teacher questionnaire: 587; teacher reading test: 51; teacher mathematics test: 43; school head questionnaire: 319; school form: 58; and learner name form: 51.

    In the case of Namibia the total number of keystrokes was as follows: learner questionnaire: 762,600; learner reading test: 429,080; learner mathematics test: 328,250; teacher questionnaire: 358,657; teacher reading test: 15,504; teacher mathematics test: 14,061; school head questionnaire: 86,130; school form: 39,150; and learner name form: 259,284. That is, a total of 2,292,716 keystrokes were required to enter all of the data for Namibia.

    An experienced keyboard operator can work at a rate of 25 keystrokes per minute (working from multi-paged questionnaires and stopping occasionally to clarify individual questionnaire entries with the supervisor). Assuming that this kind of work rate could be sustained for, say, around a maximum of six hours per day, then the whole data entry operation for Namibia was estimated to amount to around 255 person days of data entry work. This implied an estimated 10 weeks of work for the 5-person data entry team that operated in Namibia. However, the work was completed in 7 weeks because the data enterers worked extra hours.

    At the end of this procedure the data files were sent by email to the unit 'Monitoring Educational Quality' at the IIEP in Paris. Many consistency checks were made for many variables as well as for the identification codes used. The IIEP team had many queries. The first data files were sent to Paris in May 2001 and after nine to-ings and fro-ings the files were finally declared to be clean on 25 January 2002.

    Response rate

    Response rates for pupils and schools respectively were 91.8 percent and 100 percent. The reason for the shortfall in learner numbers was absenteeism by some learners in some of the schools on the day of data collection. However, sampling weights were used to correct for disproportionality among strata in the calculation

  19. h

    WirelessMATHBench-XL

    • huggingface.co
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XinLi (2025). WirelessMATHBench-XL [Dataset]. https://huggingface.co/datasets/XINLI1997/WirelessMATHBench-XL
    Explore at:
    Dataset updated
    Oct 9, 2025
    Authors
    XinLi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WirelessMATHBench-XL

      Dataset Summary
    

    WirelessMATHBench-XL is a collection of 4,027 graduate-level wireless communications math problems that pair long technical context passages with precise quantitative questions. Each item is derived from recent arXiv preprints in signal processing, networking, and edge intelligence, and is formatted to elicit deliberate step-by-step reasoning from large language models. Problems are tagged as multiple choice or fill-in-the-blank (with… See the full description on the dataset page: https://huggingface.co/datasets/XINLI1997/WirelessMATHBench-XL.

  20. A Comparison of Four Methods for the Analysis of N-of-1 Trials

    • figshare.com
    doc
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinlin Chen; Pingyan Chen (2023). A Comparison of Four Methods for the Analysis of N-of-1 Trials [Dataset]. http://doi.org/10.1371/journal.pone.0087752
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xinlin Chen; Pingyan Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveTo provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models.MethodsThe four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.ResultsThe results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect.ConclusionWe recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
James Mann (2023). OCR large data set [Dataset]. https://www.kaggle.com/datasets/jame5mann/ocr-large-data-set
Organization logo

OCR large data set

The LDS used in the OCR A level maths exam (statistics)

Explore at:
156 scholarly articles cite this dataset (View in Google Scholar)
zip(264412 bytes)Available download formats
Dataset updated
Feb 15, 2023
Authors
James Mann
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This is the large data set as featured in the OCR H240 exam series.

Questions about this dataset will be featured in the statistics paper

The LDS is a .xlsx file containing 5 tables, four data, one information. The data is drawn from the UK censuses from the years 2001 and 2011. It is designed for you to make comparisons and analyses of the changes in demographic and behavioural features of the populace. There is the age structure of each local authority and the method of travel within each local authority.

Search
Clear search
Close search
Google apps
Main menu