Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade 2006-2011 - District - All Students’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/699ac33d-2326-4ba7-b51c-a0cb70ea33e0 on 13 November 2021.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer.
--- Original source retains full ownership of the source dataset ---
English and maths (formerly Skills for Life) qualifications are designed to give people the reading, writing, maths and communication skills they need in everyday life, to operate effectively in work and to help them succeed on other training courses.
These data provide information on participation and achievements for English and maths qualifications and are broken down into a number of key reports.
If you need help finding data please refer to the table finder tool to search for specific breakdowns available for FE statistics.
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">10.9 MB</span></p>
<p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
<details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
Request an accessible format.
If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternative.formats@education.gov.uk" target="_blank" class="govuk-link">alternative.formats@education.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade - District - By Disability Status’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/3f98e6d8-4cb9-479f-adc4-27e3ffb8c504 on 12 November 2021.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer.
--- Original source retains full ownership of the source dataset ---
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087. Abstract The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 571,831 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. Data Description The dataset consists of a total of 571,831 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th November (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files. Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022) Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022) Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022) Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022) Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022) Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022) Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022) Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022) Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022) Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022) Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022) Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022) Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset, the Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) may be used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade - District - By Race- Ethnicity’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/0eb87b06-c9f5-4cfd-b896-11995ec89374 on 12 November 2021.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer.
--- Original source retains full ownership of the source dataset ---
Mathematical Expressions Dataset
Dataset Description
This dataset contains images of mathematical expressions along with their corresponding LaTeX code. Images will automatically be displayed as thumbnails in Hugging Face's Data Studio.
Dataset Summary
Number of files: 17 Parquet files Estimated number of samples: 3,212,312 Format: Parquet optimized for Hugging Face Features configured for thumbnails: ✅ Columns: latex: LaTeX code of the mathematical expression… See the full description on the dataset page: https://huggingface.co/datasets/ToniDO/TeXtract_padding.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
High-resolution mass spectrometry (HRMS) elucidates the molecular composition of dissolved organic matter (DOM) through the unequivocal assignment of molecular formulas. When HRMS is used as a detector coupled to high performance liquid chromatography (HPLC), the molecular fingerprints of DOM are further augmented. However, the identification of eluting compounds remains impossible when DOM chromatograms consist of unresolved humps. Here, we utilized the concept of mathematical chromatography to achieve information reduction and feature extraction. Parallel Factor Analysis (PARAFAC) was applied to a dataset describing the reverse-phase separation of DOM in headwater streams located in southeast Sweden. A dataset consisting of 1355 molecular formulas and 7178 mass spectra was reduced to five components that described 96.89% of the data. Each component summarized the distinct chromatographic elution of molecular formulas with different polarity. Component scores represented the abundance of the identified HPLC features in each sample. Using this chemometric approach allowed the identification of common patterns in HPLC–HRMS datasets by reducing thousands of mass spectra to only a few statistical components. Unlike in principal component analysis (PCA), components closely followed the analytical principles of HPLC–HRMS and therefore represented more realistic pools of DOM. This approach provides a wealth of new opportunities for unravelling the composition of complex mixtures in natural and engineered systems.
Methods Dataset1.zip
Samples were stored unfiltered in the dark at 4° C for approximately five months after sampling.
On the day of measurements, specific volumes of samples were transferred to 2 mL Eppendorf vials so that 11.25 µg carbon was present in each sample vial, while 2 mL of blanks were transferred.
The water in samples and blanks was subsequently removed by vacuum evaporation at 45° C, after which samples were reconstituted in 150 µL 1 % (v/v) formic acid to a final concentration of 75 mg/L carbon.
Reverse-phase chromatography separations were performed on an Agilent 1100 series instrument with an Agilent PLRP‑S series column (150 x 1 mm, 3 µm bed size, 100 Å pore size). Eighty µL sample was loaded at a flow rate of 100 µL min-1 0.1 % formic acid, 0.05 % ammonia, and 5 % acetonitrile. The elution of DOM was achieved through a step-wise increase in concentration of solvent B (100 % acetonitrile) from zero initially, followed by 20 %, and ending in > 45 % solvent B.
Mass spectrometry detection was carried out with an Orbitrap LTQ-Velos-Pro (Thermo Scientific, Germany) with electrospray ionization (ESI, negative mode) as ion source. Transient ions were collected in the range of m/z 150 ‑1000 at an instrumental resolving power set to 105. An external calibration with the manufacturer’s calibration mixture was followed by an internal calibration using six ubiquitous ions in the range of m/z 251 ‑ 493.
Transients were filtered for noise after considering peaks with mass defect 0.6-0.8 as noise and removing all peaks with intensity lower than the mean + 3 standard deviations of these peaks. Molecular formulas were assigned within the range C4-40, H4-80, O1-40, N0-1, S0-1 in the mass range m/z 170 – 700. Additionally, assignments were constricted to O/C 0-1, H/C 0.3 ‑2, a double bond equivalent minus oxygen less than or equal to 10, and a mass defect of ‑0.1 to 0.3 (decimal after the nominal mass).
Formulas detected in process blanks were excluded from further analysis. Formulas were also removed from consideration in samples if the intensity did not exceed the noise + 10 standard deviations in at least 10 sequential transients at some point in the elution. This molecular formula assignment and data treatment yielded 2052 unique molecular formulas. Several sequential intensities (typically 3-4) were summed to a chromatographic resolution of 0.1 min to favour analyte signals over instrument noise and to reduce computational requirements.
To yield a more quantitative dataset in subsequent analyses, the DOC normalization was reversed by accounting for the sample specific volume that yielded the constant amount of carbon dissolved for chromatographic analysis. For statistical modelling, the retention time window of 5.0 ‑22.9 min was selected, yielding a preliminary dataset size of 74 samples x 2052 molecular formulas x 180 retention times (dataset_1.zip).
Dataset2.zip
Dataset 1 was the source of dataset 2.
All mass spectra were divided by a factor of 4.92 x 107
Masses that were detected in less than 10 % of measurements (including samples and retention times) were excluded from further analysis (N = 661, Dataset4.zip).
An additional 36 molecular formulas (Dataset3.zip) were removed from the dataset due to noticeably unique chromatograms.
Chromatographic sections with missing observations of at least 2 min (20 observations or more) were set to zero while leaving a gap of missing numbers of 0.7 min to each end of the section.
Every 2nd retention time (after t = 7 min) was excluded
All data above retention times of 22.2 min was excluded.
Dataset3.zip
Dataset3 contains outliers that were removed in step 4 of Dataset2
Dataset4.zip
Dataset4 contains rarely observed formulas that were removed in step 3 of Dataset2
Dataset5.zip
To isolate groups of molecular formulas with identical chromatographic elution profiles, parallel factor analysis (PARAFAC) was utilized. All data processing and modelling was carried out using PLS_Toolbox (v8.61, Eigenvector Research Inc.) in MATLAB (v9.7, MathWorks Inc.). PARAFAC models were constrained to nonnegativity in all modes and the convergence criterion was set to a relative change in fitting error between iterations of 10-12. Each model was initialized 50 times with orthogonalized random numbers and only the least squares solution was further inspected. Models with two to nine components were considered. A five-component model was validated. Dataset5 contains it's properties and supporting geochemical sample parameters.
Dataset6.zip
Dataset6 contains the residual chromatograms (data minus model = dataset2 minus dataset5) for every sample and formula. To create one file, the residuals were unfolded into one large matrix. Each formula in every sample is acompanied by a tag that categorizes the residual chromatogram. The categories were assigned as follows (numbers correspond to the numbering scheme in the csv file):
(1) Underestimations are chromatograms in which more than 80% of residuals were positive.
(2) Overestimations are chromatograms in which more than 80% of residuals were negative.
(3) False positive abundances were identified by counting the cases in which PARAFAC estimated a non-zero chromatogram, but the data only contained zeros or missing observations.
(5) Residuals were classified as random when they did not fall into any other category, their absolute median was < 0.001, and the number of positive and negative residuals each accounted for between 40 and 60 % of the raw chromatograms (not counting zeros or missing observations).
(NaN) Residuals did not fall into any of the above categories. Therefore "uncategorized".
The number "4" was not used in this listing.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Reducing eutrophication in surface water is a major environmental challenge in many countries around the world. In cold Canadian prairie agricultural regions, part of the eutrophication challenge arises during spring snowmelt when a significant portion of the total annual nutrient export occurs, and plant residues can act as a nutrient source instead of a sink. Although the total mass of nutrients released from various crop residues has been studied before, little research has been conducted to capture fine-timescale temporal dynamics of nutrient leaching from plant residues, and the processes have not been represented in water quality models. In this study, we measured the dynamics of P and N release from a cold-hardy perennial plant species, alfalfa (Medicago sativa L.), to meltwater after freeze–thaw through a controlled snowmelt experiment. Various winter conditions were simulated by exposing alfalfa residues to different numbers of freeze–thaw cycles (FTCs) of uniform magnitude prior to snowmelt. The monitored P and N dynamics showed that most nutrients were released during the initial stages of snowmelt (first 5 h) and that the magnitude of nutrient release was affected by the number of FTCs. A threshold of five FTCs was identified for a greater nutrient release, with plant residue contributing between 0.29 (NO3) and 9 (PO4) times more nutrients than snow. The monitored temporal dynamics of nutrient release were used to develop the first process-based predictive model controlled by three potentially measurable parameters that can be integrated into catchment water quality models to improve nutrient transport simulations during snowmelt.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The short-distance continuous diversion area plays a crucial role within mountainous urban expressway systems, significantly enhancing the efficiency of specialized road sections through capacity analysis. This study develops a capacity calculation model tailored to the diversion area’s unique characteristics and principal capacity-influencing factors. Initially, the research focuses on a specific short-distance continuous diversion area of a mountainous urban expressway, employing video trajectory tracking technology to gather trajectory data. This data serves as the basis for analyzing road and traffic characteristics. Subsequently, the model computes the capacity influenced by eight variables, including diversion point spacing and deceleration lane length, using VISSIM simulation experiments. A gray correlation analysis identifies key factors, which guide the establishment of the model’s fundamental structure through two-factor surface fitting results. Mathematical statistical methods are then applied to resolve the model’s parameters, culminating in a robust capacity calculation model. The findings reveal that diversion point spacing, along with primary and secondary diversion ratios, significantly influence capacity. Notably, the capacity exhibits a marked quadratic polynomial relationship with the primary diversion ratio and diversion point spacing, and a linear relationship with the secondary diversion ratio. The model’s validity is confirmed through a case study at the diversion area north of Huacun Interchange in Chongqing Municipality, where the discrepancy between calculated and actual capacities is under 5%, underscoring the model’s high accuracy. These results offer valuable theoretical and methodological support for the planning, design, and traffic management of diversion areas.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade 2006-2011 - District - All Students’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/699ac33d-2326-4ba7-b51c-a0cb70ea33e0 on 13 November 2021.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer.
--- Original source retains full ownership of the source dataset ---