https://rightsstatements.org/page/InC/1.0/https://rightsstatements.org/page/InC/1.0/
The archived data set consists of 19 interview transcriptions from thematic interviews with 11 professionals working in early childhood education and care (ECEC) and 8 teachers from basic education. The participants were recruited from different Finnish cities through email requests to teacher networks and individuals and local education authorities. The main focus of the interviews was on different ways of implementing language education (e.g. principles, goals, new/innovative approaches, collaboration). The interviews were conducted in various different contexts of language education (e.g. language aware teaching, foreign language teaching, bilingual education). Most of the interviews were conducted in Finnish. More detailed information about the metadata and interviews can be found in the metadata files. The interviews can be used for studies on teacher perspectives and reflections within different language education contexts. The dataset includes accounts of individual and community innovation and development. When using the data it should also be used according the IKI-privacy notice based on the JYU guidelines (2018-2019). The IKI-research plan is available as a part of the metadata files. Please note that the teachers should not be evaluated nor the interaction between the interviewers and participants. The data was gathered 2018-2021. Data is a part of a larger IKI dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
no. 8
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
According to the distribution of ethnic minority population in Guangxi, 157,598 parents of preschool children from 12 ethnic autonomous counties in Guangxi participated in the assessment. Data on the physical and mental development of preschool children were collected to analyze the distribution characteristics of five major domains: health and physical fitness, language and communication, sociality and emotion, exploration and cognition, and aesthetics and performance. These characteristics were analyzed based on dimensions such as ethnicity, gender, grade, urban and rural areas, and kindergarten attributes. This data collection aims to accumulate resources for the formal implementation of brain and intellectual development assessment and to conduct preliminary exploration. The "Assessment Scale for the Physical and Mental Development of Preschool Children" was used for the assessment. This tool comprehensively covers the five major domains of preschool children's physical and mental development (health and physical fitness, language and communication, sociality and emotion, exploration and cognition, aesthetics and performance) and has high internal consistency reliability (Cronbach's > 0.85). Data analysis and visualization were conducted using R language tools such as psych, psychtool, and dplyr. The data is a 157,598*18 data frame, with each row representing a subject record. The first to eighteenth columns represent: individual id, gender, scale code, scores of each dimension of the scale (V1-V7), role of the subject, city address, preschool, city address, grade, urban and rural series, kindergarten attribute, mean, and ethnicity.
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 833 brain MRI images (T1w and T2w) from infancy and early childhood. The age of the subjects is between 0 months and 36 months. It contains a wide range of pathologies as well as healthy subjects. It is a quite diverse dataset acquired in the clinical routine over several years (images acquired with same scanner, but different protocols).
The T1w images are resampled to the shape of the T2w images. Then both are skull stripped.
All details about this dataset can be found in the paper "Development and Evaluation of Deep Learning Models for Automated Estimation of Myelin Maturation Using Pediatric Brain MRI Scans". If you use this dataset please cite our paper: https://pubs.rsna.org/doi/10.1148/ryai.220292
The metadata can be found in the table meta.csv.
Description of columns:
myelinisation: myelin maturation status in terms of delayed, normal or accelerated according to evaluation by an expert radiologist. For more detail please see the paper.
age: the chronological age (in months) since birth.
age_corrected: the corrected chronological age (in months), which corrected for the premature babies by the number of month the baby was born before 37 weeks of gestation (in month), hence a preterm newborn gets a negative age.
doctor_predicted_age: the predicted age (in months) of the myelin maturation by expert radiologist (subjects with delayed myelin maturation will get lower values than their chronological age).
diagnosis: list of pathologies found in this dataset according to expert radiology reports.
The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:
ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).
All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The number of children in early childhood education and care by age group and type of care since 2002, the attendance days of municipal day care centres from 2005 onwards and the number of employees since 2008, and the cost of the early childhood education system from 2009 in the six largest cities in Finland.
The reviews of early childhood education and care monitor the use and costs of early childhood education and care provided by municipalities and municipalities as outsourced services, private care support and service vouchers, as well as the use and costs of child home care support. The review also includes pre-primary education in accordance with the Basic Education Act and open early childhood education activities in accordance with the Act on Early Childhood Education.
The sixth cities are made up of the six most populous cities in Finland. The six cities in the order of the population include Helsinki, Espoo, Tampere, Vantaa, Turku and Oulu. The six working groups compare the social and health services of cities and early childhood education and care services. Data on customer numbers, performances, personnel and costs are mainly compiled from municipalities’ own information systems and financial statements. City experts agree on as uniform definitions as possible for data collection and implement the data collection in practice.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
According to the distribution of ethnic minority populations in Guangxi, 320425 parents of preschool children from 12 autonomous counties participated in the assessment. Collect data on parenting styles and physical and mental development of preschool children, including six parenting style factors: "humiliation vs. respect", "rejection vs. acceptance", "punishment vs. motivation", "dictatorship vs. democracy", "indulgence (leniency) vs. control", "rudeness vs. protection (civilization)". Each factor has 10 items, for a total of 60 items. The distribution characteristics of physical and mental development include five major areas: health and physical fitness, language and communication, social and emotional, exploration and cognition, and aesthetics and performance. Analyze these characteristics based on dimensions such as race, gender, grade level, urban-rural area, and kindergarten attributes.Use the "Parental Rearing Style Scale" and "Pre school Children's Physical and Mental Development Assessment Scale" for evaluation. Has high internal consistency reliability (Cronbach>0.85). Using psychology Use R language tools such as psychtool and dplyr for data analysis and visualization.The data is a frame of 320425 * 21, with each row representing a topic record. Includes: personal ID, gender, scale code, scores for each dimension of the scale, subject role, city address, preschool class, city address, grade level, urban-rural series, kindergarten attributes, mean, and race.
Maximilian B. Kiss, Sophia B. Coban, K. Joost Batenburg, Tristan van Leeuwen, and Felix Lucka "2DeteCT - A large 2D expandable, trainable, experimental Computed Tomography dataset for machine learning", Sci Data 10, 576 (2023) or arXiv:2306.05907 (2023)
Abstract: "Recent research in computational imaging largely focuses on developing machine learning (ML) techniques for image reconstruction, which requires large-scale training datasets consisting of measurement data and ground-truth images. However, suitable experimental datasets for X-ray Computed Tomography (CT) are scarce, and methods are often developed and evaluated only on simulated data. We fill this gap by providing the community with a versatile, open 2D fan-beam CT dataset suitable for developing ML techniques for a range of image reconstruction tasks. To acquire it, we designed a sophisticated, semi-automatic scan procedure that utilizes a highly-flexible laboratory X-ray CT setup. A diverse mix of samples with high natural variability in shape and density was scanned slice-by-slice (5000 slices in total) with high angular and spatial resolution and three different beam characteristics: A high-fidelity, a low-dose and a beam-hardening-inflicted mode. In addition, 750 out-of-distribution slices were scanned with sample and beam variations to accommodate robustness and segmentation tasks. We provide raw projection data, reference reconstructions and segmentations based on an open-source data processing pipeline."
The data collection has been acquired using a highly flexible, programmable and custom-built X-ray CT scanner, the FleX-ray scanner, developed by TESCAN-XRE NV, located in the FleX-ray Lab at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, Netherlands. It consists of a cone-beam microfocus X-ray point source (limited to 90 kV and 90 W) that projects polychromatic X-rays onto a 14-bit CMOS (complementary metal-oxide semiconductor) flat panel detector with CsI(Tl) scintillator (Dexella 1512NDT) and 1536-by-1944 pixels, each. To create a 2D dataset, a fan-beam geometry was mimicked by only reading out the central row of the detector. Between source and detector there is a rotation stage, upon which samples can be mounted. The machine components (i.e., the source, the detector panel, and the rotation stage) are mounted on translation belts that allow the moving of the components independently from one another.
Please refer to the paper for all further technical details.
The complete data collection can be found via the following links: 1-1,000, 1,001-2,000, 2,001-3,000, 3,001-4,000, 4,001-5,000, 5,521-6,370.
Each slice folder ‘slice00001 - slice05000’ and ‘slice05521 - slice06370’ contains three folders for each mode: ‘mode1’, ‘mode2’, ‘mode3’. In each of these folders there are the sinogram, the dark-field, and the two flat-fields for the raw data archives, or just the reconstructions and for mode2 the additional reference segmentation.
The corresponding reference reconstructions and segmentations can be found via the following links: 1-1,000, 1,001-2,000, 2,001-3,000, 3,001-4,000, 4,001-5,000, 5,521-6,370.
The corresponding Python scripts for loading, pre-processing, reconstructing and segmenting the projection data in the way described in the paper can be found on github. A machine-readable file with the used scanning parameters and instrument data for each acquisition mode as well as a script loading it can be found on the GitHub repository as well.
Note: It is advisable to use the graphical user interface when decompressing the .zip archives. If you experience a zipbomb error when unzipping the file on a Linux system rerun the command with the UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE environment variable by setting in your .bashrc “export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE”.
For more information or guidance in using the data collection, please get in touch with
Maximilian.Kiss [at] cwi.nl
Felix.Lucka [at] cwi.nl
Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
tables
The BuildingsBench datasets consist of: Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock. 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF. Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB). BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below: ElectricityLoadDiagrams20112014 Building Data Genome Project-2 Individual household electric power consumption (Sceaux) Borealis SMART IDEAL Low Carbon London A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
This dataset contains cross-sectional data collected on child development outcomes, child characteristics, and parental and home characteristics for a sample of 1,311 children ages 6-42 months of age living in a representative sample of low- and low-middle-income households in Bogota, Colombia. This is the sample used for the analysis in the paper “Concurrent Validity andFeasibility of Short Tests Currently Used to Measure Early Childhood Development in Large Scale Studies” by Marta Rubio-Codina, M Caridad Araujo, Orazio Attanasio, Pablo Muñoz and Sally Grantam-McGregor, forthcoming at PLOS ONE. The dataset and do files shared allow replication of the results in the paper. Please note that these data can only be used for non-commercial research purposes given the IDB data sharing standards and in order to comply with the commitment acquired by the researchers with study participants by means of the informed consent.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.
Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Offline reinforcement learning (RL) is a promising direction that allows RL agents to be pre-trained from large datasets avoiding recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, and 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components. This submission provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite Mendez et al., 2022. In every task in CompoSuite, a robot arm is used to manipulate an object to achieve an objective all while trying to avoid an obstacle. There are for components for each of these four axes that can be combined arbitrarily leading to a total of 256 tasks. The component choices are * Robot: IIWA, Jaco, Kinova3, Panda* Object: Hollow box, box, dumbbell, plate* Objective: Push, pick and place, put in shelf, put in trashcan* Obstacle: None, wall between robot and object, wall between goal and object, door between goal and object The four included datasets are collected using separate agents each trained to a different degree of performance, and each dataset consists of 256 million transitions. The degrees of performance are expert data, medium data, warmstart data and replay data: * Expert dataset: Transitions from an expert agent that was trained to achieve 90% success on every task.* Medium dataset: Transitions from a medium agent that was trained to achieve 30% success on every task.* Warmstart dataset: Transitions from a Soft-actor critic agent trained for a fixed duration of one million steps.* Medium-replay-subsampled dataset: Transitions that were stored during the training of a medium agent up to 30% success. These datasets are intended for the combined study of compositional generalization and offline reinforcement learning. Methods The datasets were collected by using several deep reinforcement learning agents trained to the various degrees of performance described above on the CompoSuite benchmark (https://github.com/Lifelong-ML/CompoSuite) which builds on top of robosuite (https://github.com/ARISE-Initiative/robosuite) and uses the MuJoCo simulator (https://github.com/deepmind/mujoco). During reinforcement learning training, we stored the data that was collected by each agent in a separate buffer for post-processing. Then, after training, to collect the expert and medium dataset, we run the trained agents for 2000 trajectories of length 500 online in the CompoSuite benchmark and store the trajectories. These add up to a total of 1 million state-transitions tuples per dataset, totalling a full 256 million datapoints per dataset. The warmstart and medium-replay-subsampled dataset contain trajectories from the stored training buffer of the SAC agent trained for a fixed duration and the medium agent respectively. For medium-replay-subsampled data, we uniformly sample trajectories from the training buffer until we reach more than 1 million transitions. Since some of the tasks have termination conditions, some of these trajectories are trunctated and not of length 500. This sometimes results in a number of sampled transitions larger than 1 million. Therefore, after sub-sampling, we artificially truncate the last trajectory and place a timeout at the final position. This can in some rare cases lead to one incorrect trajectory if the datasets are used for finite horizon experimentation. However, this truncation is required to ensure consistent dataset sizes, easy data readability and compatibility with other standard code implementations. The four datasets are split into four tar.gz folders each yielding a total of 12 compressed folders. Every sub-folder contains all the tasks for one of the four robot arms for that dataset. In other words, every tar.gz folder contains a total of 64 tasks using the same robot arm and four tar.gz files form a full dataset. This is done to enable people to only download a part of the dataset in case they do not need all 256 tasks. For every task, the data is separately stored in an hdf5 file allowing for the usage of arbitrary task combinations and mixing of data qualities across the four datasets. Every task is contained in a folder that is named after the CompoSuite elements it uses. In other words, every task is represented as a folder named
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.
The Medium-Term Effects of Home-based Early Childhood Development Intervention Impact Evaluation (ECDIIE) covered 96 small towns in central Colombia, representing a large number of small communities across a relatively big geographical area. It exploited structures in place from the government's Conditional Cash Transfer Programme, Familias en Accion (FeA), which targets the poorest 20% of households in the country.
There are currently three waves of data, a baseline, pre-intervention wave collected between February and June 2010, and a follow-up wave 18 months later between September and December 2011, at the end of the intervention period. The second wave of follow-up data collection occurred 2 years after the first follow-up data collection between September and December 2013.
The beneficiaries of FeA periodically elect a female representative, called the Madre Lider (ML). We randomly selected three from each town (municipality), and then from the families represented by the ML we randomly selected 5 children aged 12 to 24 months to be eligible for the intervention. Within each municipality, eligible households were randomly allocated (at the municipality level) to each of the following treatment arms:
Control
Stimulation + Supplementation
Stimulation
Supplementation
The stimulation intervention consisted of weekly visits to the homes of the target children, each visit lasting around one hour. The home visitors received a three-week training programme in activities designed to stimulate children at different ages. They also received a weekly curriculum as a guide, and a set of locally produced materials (homemade toys from recycling material, picture books, puzzles, etc.).
The supplementation arm consisted of providing daily sachets of multiple micronutrient powder to mothers, via the home visitors, to add to the target child's food. Sachets were designed to provide iron (12.5mg), zinc (5mg), Vitamin A (300 µg retinol equivalent), Vitamin C (30mg) and folic acid (160 µg) for the children targeted.
Sample survey data [ssd]
The survey used a randomized experimental design to obtain rigorous and unbiased estimates of the impact of the stimulation and nutrition interventions, and of their interaction.
Computer Assisted Personal Interview [capi]
At follow up, it was collected a variety of developmental indicators, including the Bayley test (for cognitive, language and motor development), the MacArthur Communicative Development Inventories (for vocabulary and expressive language), the Bates Infant Characteristics Questionnaire (for temperament), and the Rothbart Infant Behaviour Questionnaires (for attention focusing, inhibitory control and sociability amongst other socio-emotional traits). These data were again complemented by an extensive socio-economic questionnaire which included information on parental investments, time use and so on.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Child-staff ratios are a key quality indicator in early childhood education and care (ECEC) programs. Better ratios are believed to improve child outcomes by increasing opportunities for individual interactions and educational instruction from staff. The purpose of this systematic review, and where possible, meta-analysis, was to evaluate the association between child-staff ratios in preschool ECEC programs and children’s outcomes. Searches of Medline, PsycINFO, ERIC, websites of large datasets and reference sections of all retrieved articles were conducted up to July 3, 2015. Cross-sectional or longitudinal studies that evaluated the relationship between child-staff ratios in ECEC classrooms serving preschool aged children and child outcomes were independently identified by two reviewers. Data were independently extracted from included studies by two raters and differences between raters were resolved by consensus. Searches revealed 29 eligible studies (31 samples). Child-staff ratios ranged from 5 to 14.5 preschool-aged children per adult with a mean of 8.65. All 29 studies were included in the systematic review. However, the only meta-analysis that could be conducted was based on three studies that explored associations between ratios and children’s receptive language. Results of this meta-analysis were not significant. Results of the qualitative systematic review revealed few significant relationships between child-staff ratios and child outcomes construed broadly. Thus, the available literature reveal few, if any, relationships between child-staff ratios in preschool ECEC programs and children’s developmental outcomes. Substantial heterogeneity in the assessment of ratios, outcomes measured, and statistics used to capture associations limited quantitative synthesis. Other methodological limitations of the research integrated in this synthesis are discussed.
https://rightsstatements.org/page/InC/1.0/https://rightsstatements.org/page/InC/1.0/
The archived data set consists of 19 interview transcriptions from thematic interviews with 11 professionals working in early childhood education and care (ECEC) and 8 teachers from basic education. The participants were recruited from different Finnish cities through email requests to teacher networks and individuals and local education authorities. The main focus of the interviews was on different ways of implementing language education (e.g. principles, goals, new/innovative approaches, collaboration). The interviews were conducted in various different contexts of language education (e.g. language aware teaching, foreign language teaching, bilingual education). Most of the interviews were conducted in Finnish. More detailed information about the metadata and interviews can be found in the metadata files. The interviews can be used for studies on teacher perspectives and reflections within different language education contexts. The dataset includes accounts of individual and community innovation and development. When using the data it should also be used according the IKI-privacy notice based on the JYU guidelines (2018-2019). The IKI-research plan is available as a part of the metadata files. Please note that the teachers should not be evaluated nor the interaction between the interviewers and participants. The data was gathered 2018-2021. Data is a part of a larger IKI dataset.