73 datasets found
  1. f

    Hyperparameters for each classification model.

    • plos.figshare.com
    xls
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Hyperparameters for each classification model. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.

  2. Social media Youth dataset

    • kaggle.com
    zip
    Updated Jul 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srijan Sharma (2021). Social media Youth dataset [Dataset]. https://www.kaggle.com/datasets/fitsri/social-media-youth-dataset
    Explore at:
    zip(11210 bytes)Available download formats
    Dataset updated
    Jul 16, 2021
    Authors
    Srijan Sharma
    Description

    Dataset

    This dataset was created by Srijan Sharma

    Contents

  3. f

    Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...

    • frontiersin.figshare.com
    docx
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Frontiers
    Authors
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

  4. d

    School Attendance by Student Group and District, 2022-2023

    • catalog.data.gov
    • data.ct.gov
    Updated Sep 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2023). School Attendance by Student Group and District, 2022-2023 [Dataset]. https://catalog.data.gov/dataset/school-attendance-by-student-group-and-district-2022-2023
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    data.ct.gov
    Description

    This dataset includes the attendance rate for public school students PK-12 by student group and by district during the 2022-2023 school year. Student groups include: Students experiencing homelessness Students with disabilities Students who qualify for free/reduced lunch English learners All high needs students Non-high needs students Students by race/ethnicity (Hispanic/Latino of any race, Black or African American, White, All other races) Attendance rates are provided for each student group by district and for the state. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch. When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.

  5. d

    Reading for Success - Small Scale Experimentation Morocco

    • datasets.ai
    • s.cnmilf.com
    • +1more
    21
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Agency for International Development (2024). Reading for Success - Small Scale Experimentation Morocco [Dataset]. https://datasets.ai/datasets/reading-for-success-small-scale-experimentation-morocco
    Explore at:
    21Available download formats
    Dataset updated
    Aug 8, 2024
    Dataset authored and provided by
    US Agency for International Development
    Area covered
    Morocco
    Description

    The Reading for Success – Small-Scale Experimentation (RFS-SSE) activity, a component of a broader USAID initiative, was designed to reflect ongoing collaborations between USAID/Morocco and the Ministry of Education and Vocational Training (MOEVT) to improve reading instruction in Morocco. Conceived as a learning activity, RFS-SSE developed an evidence base of effective approaches that improve reading skills in targeted primary schools. RFS-SSE began when the MOEVT was developing a 15-year education reform called Vision 2030 as well as a set of medium-term activities for the period 2015-2020. Reform efforts addressed a key weakness in the Moroccan educational system: poor reading skills at the primary level. The RFS-SSE intervention helped to inform the revisions to the existing curriculum and the design of the reformed curriculum by providing data and evidence to support the envisioned changes.

    To assess the impact of the RFS-SSE reading program, RFS-SSE selected a longitudinal evaluation design which included reading assessments of two cohorts of students. Cohort 1 was assessed at four different times - the middle of Grade 1 and throughout Grade 2: Baseline (January 2016), Midline 1 (May 2016), Midline 2 (September 2016), and Endline (May 2017). Cohort 2 was assessed twice – the beginning and end of Grade 1: Midline 2 (September 2016) and Endline (May 2017). A stratified cluster random sampling method was used to assure that (1) an equal number of boys and girls in urban and rural schools would be assessed and (2) that the results of the study would be generalizable to the entire population of intervention schools in each of the eight delegations selected for intervention. Schools were first stratified by geographic location and urban/rural environment. Within schools, students were stratified by gender. All students in Grades 1 and 2 were assessed with the same EGRA instrument.

  6. O

    School Attendance by District, 2020-2021

    • data.ct.gov
    • datasets.ai
    • +1more
    application/rdfxml +5
    Updated Aug 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State Department of Education (2021). School Attendance by District, 2020-2021 [Dataset]. https://data.ct.gov/w/a4ya-h6eq/wqz6-rhce?cur=2nshOTxWLWS
    Explore at:
    json, application/rssxml, csv, tsv, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Aug 6, 2021
    Dataset authored and provided by
    State Department of Education
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset includes the attendance rate for public school students PK-12 by district during the 2020-2021 school year.

    Attendance rates are provided for each district for the overall student population and for the high needs student population. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch.

    When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.

  7. m

    AR-ASAG-Dataset

    • data.mendeley.com
    Updated Jul 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leila Ouahrani (2020). AR-ASAG-Dataset [Dataset]. http://doi.org/10.17632/dj95jh332j.1
    Explore at:
    Dataset updated
    Jul 1, 2020
    Authors
    Leila Ouahrani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ARabic Dataset for Automatic Short Answer Grading Evaluation V1. ISLRN 529-005-230-448-6. Our dataset consists of reported evaluations relate to answers submitted for three different exams submitted to three classes of students. The exams were conducted under natural conditions of evaluation. Each test consists of 16 short answer questions (a total of 48 questions). For each question, a model answer is proposed. Students submitted answers to these questions.
    The number of answers obtained is different from one question to another. The dataset includes a total of 2133 pairs (Model Answer, student answer). the Dataset encompasses 5 types of questions: • "عرف ": Define? • "إشرح": Explain? • "ما النتائج المترتبة على": What consequences? • "علل": Justify? • "ما الفرق": What is the difference

    AR-ASAG Dataset is available in different versions: TXT, XML, XML-MOODLE and Database (.DB).
    The .DB format allows making the necessary exports according to specific analysis needs.
    The XML-MOODLE format is used on Moodle e-learning Platforms For each pair, two grades (Mark1 and Mark2 ) are associated with a manual Average Gold Score Both manual grades are available in the dataset. Inter-Annotators Agreement: - (Pearson Correlation: r=0.8384) - (Root Mean Square Error : RMSE=0.8381). The Dataset can be also used for essay scoring as the students's answers responses take to reach 4-5 sentences. The Dataset exist in TXT, XML, XML-MOODLE Versions The name of the file is representative of its content. We use the term "Mark" to specify "Grade" For privacy reasons, no student identifiers are used in this Dataset.

  8. KU Students Wrap Up Course in Aerial Mapping Using Drones - Datasets -...

    • ckan.americaview.org
    Updated Sep 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.americaview.org (2021). KU Students Wrap Up Course in Aerial Mapping Using Drones - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/ku-students-wrap-up-course-in-aerial-mapping-using-drones
    Explore at:
    Dataset updated
    Sep 16, 2021
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LAWRENCE — Fall break is barely behind us, but a group of University of Kansas students has just finished an innovative eight-week course in using drones to develop aerial maps. Over the past two months, they’ve visited sites in KU's West District and at the Baker Wetlands, taking still images and videos over those areas. “The drone mapping course has been excellent in providing a hands-on experience with the drones,” said Siddharth Shankar, graduate student from Lucknow, India. “The course has focused not just on drones and how to fly them but also has made us aware of the FAA rules and regulations about drone flying and safety precautions. “My research has been in glaciology, with the study of icebergs in Greenland. The drone mapping course has provided new insights into incorporating it with my research in the near future.” The course, offered annually during the fall semester, is designed to teach students about the rapidly growing technology of small unmanned aerial systems, referred to as drones, and its wide-ranging applications — which include search-and-rescue, real estate and environmental monitoring. Students in the course come from a variety of disciplines including geography & atmospheric science, geology, ecology & evolutionary biology and civil engineering. Enthusiasm for the course has been very high, and it has filled rapidly each time it has been offered.

  9. School Attendance by Town, 2020-2021

    • data.ct.gov
    • datasets.ai
    • +2more
    application/rdfxml +5
    Updated Aug 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State Department of Education (2021). School Attendance by Town, 2020-2021 [Dataset]. https://data.ct.gov/Education/School-Attendance-by-Town-2020-2021/vgt5-kedq
    Explore at:
    json, xml, application/rssxml, csv, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Aug 6, 2021
    Dataset provided by
    United States Department of Educationhttp://ed.gov/
    Authors
    State Department of Education
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset includes the attendance rate for public school students PK-12 by town during the 2020-2021 school year.

    Attendance rates are provided for each town for the overall student population and for the high needs student population. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch.

    When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.

  10. w

    Book subjects where books includes Little kids first book of oceans

    • workwithdata.com
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Book subjects where books includes Little kids first book of oceans [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=includes&fval0=Little+kids+first+book+of+oceans&j=1&j0=books
    Explore at:
    Dataset updated
    Jul 20, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects and is filtered where the books includes Little kids first book of oceans, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).

  11. Replication dataset and calculations for PIIE WP 20-11, The Short- and...

    • piie.com
    Updated Jul 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sherman Robinson; Marcus Noland; Egor Gornostay; Soyoung Han (2020). Replication dataset and calculations for PIIE WP 20-11, The Short- and Long-Term Costs to the United States of the Trump Administration’s Attempt to Deport Foreign Students, by Sherman Robinson, Marcus Noland, Egor Gornostay, and Soyoung Han. (2020). [Dataset]. https://www.piie.com/publications/working-papers/short-and-long-term-costs-united-states-trump-administrations-attempt
    Explore at:
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Peterson Institute for International Economicshttp://www.piie.com/
    Authors
    Sherman Robinson; Marcus Noland; Egor Gornostay; Soyoung Han
    Area covered
    United States
    Description

    This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in The Short- and Long-Term Costs to the United States of the Trump Administration’s Attempt to Deport Foreign Students, PIIE Working Paper 20-11. If you use the data, please cite as: Robinson, Sherman, Marcus Noland, Egor Gornostay, and Soyoung Han. (2020). The Short- and Long-Term Costs to the United States of the Trump Administration’s Attempt to Deport Foreign Students. PIIE Working Paper 20-11. Peterson Institute for International Economics.

  12. w

    Book subjects where books includes Doctoral students in humanities : a...

    • workwithdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data, Book subjects where books includes Doctoral students in humanities : a small-scale panel study of information needs & uses 1976-79 [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=includes&fval0=Doctoral+students+in+humanities+:+a+small-scale+panel+study+of+information+needs+%26+uses+1976-79&j=1&j0=books
    Explore at:
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects and is filtered where the books includes Doctoral students in humanities : a small-scale panel study of information needs & uses 1976-79, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).

  13. Master dataset: NSW government school locations and student enrolment...

    • data.nsw.gov.au
    • researchdata.edu.au
    • +1more
    csv, json
    Updated Mar 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NSW Department of Education (2025). Master dataset: NSW government school locations and student enrolment numbers [Dataset]. https://data.nsw.gov.au/data/dataset/nsw-education-nsw-public-schools-master-dataset
    Explore at:
    json(3530080), csv(1284694), csv(6537)Available download formats
    Dataset updated
    Mar 27, 2025
    Dataset provided by
    NSW Department of Educationhttps://education.nsw.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Government of New South Wales, New South Wales
    Description

    The master dataset contains comprehensive information for all government schools in NSW. Data items include school locations, latitude and longitude coordinates, school type, student enrolment numbers, electorate information, contact details and more.

    This dataset is publicly available through the Data NSW website, and is used to support the School Finder tool.

    Data Notes:

    • Data relating to healthy canteen is no longer up to date as it is no longer updated by the Department, this data can be sourced through NSW health.

    • Student enrolment numbers are based on the census of government school students undertaken on the first Friday of August; and LBOTE numbers are based on data collected in March.

    • School information, such as addresses and contact details, are updated regularly as required, and are the most current source of information.

    • Data is suppressed for indigenous and LBOTE percentages where student numbers are equal to, or less than five indicated by "np".

    • NSSC out of scope schools will not have an enrolment figure.

    • NSSC and LBOTE figures are updated annually in December.

    • ICSEA values are updated every February with the previous year's ICSEA values. Small schools, SSPs and Senior Secondary schools do not have their ICSEA values published by ACARA.

    • Family Occupation and Educational Index (FOEI) is a school-level index of educational disadvantage. Data is extracted in May and values are updated annually in December.

    • Following the introduction of part-time study in secondary schools in 1993, student enrolments are generally reported in full-time equivalent units (FTE). The FTE for students studying less than 10 units, the minimum workload, is determined by the formula: 0.1 x the number of units studied and represented as a proportion of the full-time enrolment of 1.0 FTE.

    Data Source:

    • Education Statistics and Measurement. Centre for Education Statistics and Evaluation.
  14. d

    Socioeconomic dataset collected from open access sources for analysing...

    • b2find.dkrz.de
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Socioeconomic dataset collected from open access sources for analysing demand prediction of weekend markets in the city of Hamburg - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/c2c75824-11e5-55d7-bede-c6ae2939093a
    Explore at:
    Dataset updated
    Sep 11, 2024
    Area covered
    Hamburg
    Description

    Socioeconomic dataset for analysing demand prediction of weekend markets in the city of Hamburg, Germany In this DDLitlab funded Data Literacy student project, our goal was to predict weekend markets in the city of Hamburg and using open-source data and OpenStreetMaps in conjunction with Machine Learning Algorithms. You can find a brief article about the initial grant and our approach here : https://www.cliccs.uni-hamburg.de/about-cliccs/news/2023-news/2023-08-24-ddlitlab-event.html Github repository: https://gitlab.rrz.uni-hamburg.de/exploring-avenues-for-the-deployment-of-machine-learning-algorithms-for-sustainable-small-agricultural-business-information-using-openstreetmap/main-project-v-3 This repository is intended to make our codes and visualisations openly available to the University of Hamburg students for further research. This is not to be used without citation under any circumstances and the University/authors deserve the right to withdraw consent at any time. Please do not forget to cite our work in the event of fair use. Organisation of our Github repository Codes: contains the codes for the different methods deployed for data preparation,variable selection,visualisations showing the spatial characteristics of our variables, calculating indices such as correlation coefficients and machine learning methods in increasing order of complexity. City-district (Stadtteil) as the unit of analysis. Data (uploaded datasets) : The open source data obtained for the project has been obtained from OpenStreetMaps (https://wiki.openstreetmap.org/wiki/Use_OpenStreetMap ) and Statistik Nord (https://www.statistik-nord.de/ ) . Each variable contains values for all stadtteils (city-districts) of Hamburg. The filenames are self explanatory. The Hamburg shapefile has been obtained from Geofabrik https://www.geofabrik.de/de/data/shapefiles.html In addition to the original data uploaded in the section, we have also laid down the final data we have deployed with the algorithms, in the final final_data.csv Our repository contains the following additional sections: Results: This section contains results from the codes processed in the first section. It includes the final 10 variables selected for the study, the results from the VIF analysis, correlation matrix, and some model output statistics. Visualisations: This section is dedicated to visualisations of the variables used for the study and the results from deployment of various methods. In case of any questions, please do not hesitate to contact us at our official student IDs : first.lastname@studium.uni-hamburg.de. We are also available on LinkedIn for professional networking in case of other queries. Data curators /DDLitLab data literacy project team

  15. P

    ASAP-AES Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ASAP-AES Dataset [Dataset]. https://paperswithcode.com/dataset/asap
    Explore at:
    Description

    There are eight essay sets. Each of the sets of essays was generated from a single prompt. Selected essays range from an average length of 150 to 550 words per response. Some of the essays are dependent upon source information and others are not. All responses were written by students ranging in grade levels from Grade 7 to Grade 10. All essays were hand graded and were double-scored. Each of the eight data sets has its own unique characteristics. The variability is intended to test the limits of your scoring engine's capabilities.

  16. d

    Students with a disability by educational setting - Dataset - data.sa.gov.au...

    • data.sa.gov.au
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Students with a disability by educational setting - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/students-with-a-disability-by-educational-setting
    Explore at:
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Australia
    Description

    Full-time equivalent enrolments of students with a verified disability by education setting, collected from 2012 as part of the annual enrolment data collection in Term 3. Students with a disability are those students who are verified by a Department for Education psychologist or speech pathologist as eligible for the Department for Education Disability Support Program. Education settings include Mainstream Classes, Special Classes, Special Units and Special Schools. Special Classes are located within some junior primary, primary and secondary schools. They provide a setting for learners with a disability who need extensive curriculum support, for a short or long-term placement. Special Units are located within some primary and secondary schools. They provide long-term educational options in a mainstream school for learners with very significant or multiple disabilities. Special Units and Special Schools both cater for a similar range of learner needs. The difference is that Special Units provide an option within a mainstream school, while Special Schools provide the option in a separate setting.

  17. d

    Census 2001: Small Area Microdata for Imputation Analysis (SAM) - Dataset -...

    • b2find.dkrz.de
    Updated May 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Census 2001: Small Area Microdata for Imputation Analysis (SAM) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/e5e0fe34-1064-5789-96ae-da7c6da27741
    Explore at:
    Dataset updated
    May 9, 2023
    Description

    Abstract copyright UK Data Service and data collection copyright owner.The UK censuses took place on 29th April 2001. They were run by the Northern Ireland Statistics & Research Agency (NISRA), General Register Office for Scotland (GROS), and the Office for National Statistics (ONS) for both England and Wales. The UK comprises the countries of England, Wales, Scotland and Northern Ireland.Statistics from the UK censuses help paint a picture of the nation and how we live. They provide a detailed snapshot of the population and its characteristics, and underpin funding allocation to provide public services. The Census 2001: Small Area Microdata for Imputation Analysis (SAM) is a 5% sample of individuals for all countries of the UK, with 2.96 million cases. Local Authority is the lowest level of geography for England and Wales, Council Areas for Scotland and Parliamentary Constituencies for Northern Ireland. The Scilly Isles have been merged with Penwith and the City of London with Westminster. Orkney and Shetland are merged into one area. All other areas are identified. The median sample size for an authority is 5,600 records and nearly eighty authorities have more than 10,000 records. The amount of individual detail in the SAM is less than in the 2001 Individual Licenced Sample of Anonymised Records (I-SAR)(see under SNs 7205 and 7206) because of the greater geographical detail in the SAM. Caveat - Students: As with the Individual SAR, the SAM includes those enumerated in a communal establishment and also full-time students who were enumerated at an address that was not their usual term-time residence. For the latter there is only individual-level information on age, sex, marital status and full-time student status. It is recommended that these students are not included in any analyses as they do not form part of the usual residents population base. This dataset contains 155 variables, including 67 imputation flag variables. The standard version, containing 88 SAM variables, is available under SN 7207.

  18. g

    High school students on short courses by gender, age, personal income, unit...

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    High school students on short courses by gender, age, personal income, unit and time indication | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_dst-fohoj03b
    Explore at:
    Description

    StatBank dataset: FOHOJ03B Title: High school students on short courses by gender, age, personal income, unit and time indication Period type: years Period format (time in data): yyyy The oldest period: 2016 The most recent period: 2023

  19. g

    Schooling data from the University of Paris 13

    • gimi9.com
    • data.europa.eu
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schooling data from the University of Paris 13 [Dataset]. https://gimi9.com/dataset/eu_58e34f7dc751df5d2777388c
    Explore at:
    Description

    This is a dataset updated annually the description below relates to the first year of online release, since updates have taken place in 2018 (data 2008-2017) and 2019 (data 2009-2018). Paris 13 University recorded data on student registration in its information system (Apogee software) for each academic year between 2006(-2007) and 2015(-2016). These data relate to the diplomas prepared, the steps to achieve this, the scheme (if it concerns initial training or apprenticeship), the relevant components (UFR, IUT, etc.), and the origin of students (type of baccalaureate, academy of origin, nationality). Each entry concerns the main enrollment of a student at the university for a year. The attributes of this data are as follows. — CODE_INDIVIDU Hidden Data — ANNEE_INSCRIPTION Year of registration:2006 for 2006-2007, etc. — LIB_DIPLOME Diploma Name — LEVEAU_DANS_LE_DIPLOME 1, 2,... for master 1, license 2, etc. — LEVEAU_APRES_BAC 1, 2,... for Bac+ 1, Bac+ 2,... — LIBELLE_DISCIPLINE_DIPLOME Attachment of the diploma to a discipline — CODE_SISE_DIPLOME Student Tracking Information System Code — CODE_ETAPE Internal code of a stage (year, course) of diploma — LIBELLE_COURT_ETAPE Short name of step — LIBELLE_LONG_ETAPE More intelligible name of the step — LIBELLE_COURT_COMPOSANT Name of component (UFR, IUT etc.) — CODE_COMPOSANT Number code of component (unused) — REGROUPEMENT_BAC Type of Bac (L, ES, S, techno STMG, techno ST2S,...) — LIBELLE_ACADEMIE_BAC Academy of Bac (Creteil, Versailles, foreigner,...) — Continent Deduced of nationality which is masked data — LIBELLE_REGIME Initial training, continuing, pro, learning Paris 13 University publishes part of this dataset through several resources, while respecting the anonymity of its students. Starting from 213,289 entries that correspond to all enrolments of the 106,088 individuals who studied at Paris 13 University during the ten academic years between 2006(2007) and 2015(-2016), we selected several resources each corresponding to a part of the data. To produce each resource we chose a small number of attributes, then removed a small proportion of the inputs, in order to satisfy a k-anonymisation constraint with k = 5, i.e. to ensure that, in each resource, each entry appears at least 5 times identical (otherwise the input is deleted). The four resources produced are materialised by the following files. — The file ‘up13_etapes.csv’ concerns the diploma steps, it contains the attributes “CODE_ETAPE”, “LIBELLE_COURT_ETAPE”, “LIBELLE_LONG_ETAPE”, “NIVEAU_APRES_BAC”, “LIBELLE_COURT_COMPOSANTE”, “LIBELLE_DISCIPLINE_DIPLOME”, “CODE_SISE_DIPLOME”, “NIVEAU_DANS_LE_DIPLOME” and its anonymisation causes a loss of 918 entries. — The file ‘up13_Academie.csv’ concerns the Bac Academy and it contains the attributes “LIBELLE_ACADEMIE_BAC”, “NIVEAU_APRES_BAC”, “NIVEAU_DANS_DIPLOME”, “CONTINENT”, “LIBELLE_REGIME”, “LIB_DIPLOME”, “LIBELLE_COURT_COMPOSANTE” and its anoymisation causes the loss of 7525 entries. — The file ‘up13_Bac.csv’ concerns the type of Bac and the level reached after the Bac, it contains the columns “REGROUPEMENT_BAC”, “NIVEAU_APRES_BAC”, “LIBELLE_REGIME”, “CONTINENT”, “LIBELLE_COURT_COMPOSANTE”, “LIB_DIPLOME”, “NIVEAU_DANS_LE_DIPLOME” and its anonymisation causes the loss of 3,933 entries. — The file ‘up13_annees_etapes.csv’ concerns enrolment in the diploma stages year after year, it contains the columns “ANNEE_INSCRIPTION”, “LIBELLE_COURT_COMPOSANTE”, “NIVEAU_APRES_BAC”, “LIB_DIPLOME”, “CODE_ETAPE” and its anonymisation causes the loss of 3,532 entries. Other tables extracted from the same initial data and constructed using the same method of anonymisation can be provided on request (specify the desired columns). A second set of resources offers the follow-up of students year after year, from degree stage to degree stage. In this dataset, we call trace such tracking when the registration year has been forgotten and only the sequence remains. And we call cursus a data describing this succession of steps over the years. For anonymisation we have grouped the traces or the same paths and as soon as there were less than 10 we do not indicate their number, or, what amounts to the same, we put this number to 1 (the information being that there is at least one student who left this trace or followed this course). This leads to forgetting a number of too specific study paths and keeping only one as a witness. Starting from 106,088 trails or tracks, we produce the following resources. — The file ‘up13_traces.csv’ contains the sequence of diploma step codes (a trace) and anonymisation makes us forget 10 089 traces. — The file ‘up13_traces_wt_etape.csv’ contains similar traces, but without the step code. That is to say, only the diploma, the level after baccalaureate and the component concerned remain. Anonymisation makes us forget 4,447 traces. — The file ‘up13_traces_bac_wt_etape.csv’ contains the same data as in the file ‘up13_traces_wt_etape.csv’ but also with the Bac type. Anonymisation makes us forget 8,067 traces. — The file ‘up13_cursus_wt_etape.csv’ contains the same data as in the file ‘up13_traces_wt_etape.csv’ with the additional registration years. Anonymisation makes us forget 8,324 courses.

  20. c

    Data from: Longitudinal data (2010-2012) on the effectiveness of mathematics...

    • datacatalogue.cessda.eu
    • ssh.datastations.nl
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M.H.A.M. van den Heuvel-Panhuizen; M. Bakker; A. Robitzsch (2023). Longitudinal data (2010-2012) on the effectiveness of mathematics mini-games in Dutch primary schools, collected in the BRXXX project [Dataset]. http://doi.org/10.17026/dans-xda-bn9c
    Explore at:
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    Federal Institute for Education Research, Innovation and Development of the Austrian School System, Salzburg, Austria
    Radboud University Nijmegen; previously Freudenthal Institute, Utrecht University
    Freudenthal Institute, Faculty of Science & Faculty of Social and Behavioural Sciences, Utrecht University
    Authors
    M.H.A.M. van den Heuvel-Panhuizen; M. Bakker; A. Robitzsch
    Description

    This dataset consisting of longitudinal data gathered in the BRXXX project, which aimed at investigating the effectiveness of online mathematics mini-games in enhancing primary school students’ multiplicative reasoning ability. The dataset includes data of 719 students from 35 primary schools in the Netherlands, who were followed from end Grade 1 (Dutch “groep 3”) to end Grade 3 (Dutch “groep 5”. Participating schools were randomly divided over three experimental conditions (playing at school, at home, or at home with debriefing at school) and a control group. Tests of multiplicative ability were administered at the end of Grade 1 (pretest) and the end of Grade 2 and Grade 3 (posttests), measuring students’ knowledge of multiplication number facts, their skills in calculating multiplicative problems, and their insight in multiplicative number relations. In addition to the scores on these tests, the dataset includes measures of gameplay behavior, i.e. the time and effort students put in the games, and some student background variables.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Hyperparameters for each classification model. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t007

Hyperparameters for each classification model.

Related Article
Explore at:
38 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Sep 26, 2024
Dataset provided by
PLOS ONE
Authors
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.

Search
Clear search
Close search
Google apps
Main menu