CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset is a comprehensive collection of mathematical word problems spanning multiple domains with rich metadata and natural language variations. The problems contain 1 - 5 steps of mathematical operations that are specifically designed to encourage showing work and maintaining appropriate decimal precision throughout calculations.
All the problems have never been seen before and are free from copyright restrictions.
The available data has 100,000 problems. To license the templating system that created the data for magnitudes more data or customizations like the number of mathematical steps involved, and the addition of domains. Contact hello@cephalopod.studio for more information.
Intended Uses: The data can be used in 4 areas: 1) Pretraining 2) Instruction tuning 3) Finetuning 4) Benchmarking existing models
All those areas are in service of: - Training mathematical reasoning systems - Developing step-by-step problem-solving capabilities - Testing arithmetic operations across diverse real-world contexts - Evaluating precision in decimal calculations
Limitations: - Currently English-only - Limited to specific mathematical operations - Template-based generation may introduce structural patterns - Focused on arithmetic operations with up to 5 numbers
The dataset contains 100,000 total problems:
Problems span multiple domains including: - Agriculture (soil temperature changes, etc.) - Athletics (training hours, distances, etc.) - Construction (elevation changes, work hours, etc.) - Culinary (cooking temperature changes, calories per serving, etc.) - Education (GPA changes, etc.) - Entertainment (show ratings, stage lighting, etc.) - Finance (stock prices, account balances, etc.)
Each example is provided in JSONL format with the following structure:
json
{
"id": "problem_X",
"question": "Text of the math problem",
"metadata": {
"discrete": boolean,
"domain": string,
"numbers": number[],
"object_type": string,
"solution": number,
"operators": string[],
"decimals": number
},
"answer": "Text of the step-by-step solution to the problem"
}
1. Finance (Account Management):
"Jack sets up 19 bank accounts for clients. First the total rises to be 2 times greater than before. Following that, another 4 accounts were added."
2. Agriculture (Grain Storage):
"Kevin oversees 14,457 metric tons of grain storage in the new concrete silo. In the beginning, the facility grows to 3 times its current measure of grain. Following that, the overall supply of grain grows by 1,514 tons. Then, Kevin divides the holdings evenly by 1 and continues operations with a single unit."
3. Temperature Monitoring:
"The ground temperature measures 5.48 degrees Celsius. First, the temperature values are adjusted to be 1/3.74 the value they were before. Next, a sensor calibration adjustment multiplies all readings by 2.07, and later the measurements need to be scaled down by 1/3.94 due to sensor calibration. Later, the measurements need to be scaled down by 1/2.54 due to sensor calibration, and after that the measurements need to be scaled down by 1/2.21 due to sensor calibration. What is the final soil temperature in degrees Celsius? Round your answer and any steps to 2 decimal places."
1. Finance (Account Management):
"Here's how we can solve this problem:
"19 accounts times 2 equals 38
"Addition step: 38 + 4 = 42 accounts
"Based on these steps, the answer is 42."
2. Agriculture (Grain Storage):
"Following these steps will give us the answer:
"Multiplication operation: 14,457 tons * 3 = 43,371
"Add 1514 to 43,371 tons: 44,885
"x) 44,885 x 1 = 44,885 tons
"Thus, we arrive at the answer: 44885.0."
3. Temperature Monitoring:
"We can find the answer like this:
"Division: 5.48 degrees ÷ 3.74 = (Note: rounding to 2 decimal places) about 1.47
"Multiplication operation: 1.47 degrees * 2.07 = (Note: rounding to 2 decimal places) approximately 3.04
"3.04 degrees ÷ 3.94 (Note: rounding to 2 decimal places) approximately 0.77
"0.77 degrees ÷ 2.54 (Note: rounding to 2 decimal places) approximately 0.30
"When 0.30 degrees are divided by 2.21, the result is (Note: rounding to 2 decimal places) about 0.14
"This means the final result is 0.14."
Each problem includes: - Unique problem ID - Natural language question text - Includes arithemetic operations involving decimals and integers, values that are positive and negative, and requirements for rounding to a specific number of decimal places. - Detailed metadata including: - Domain classification - Object types and units - Numerical values used - Mathematical operators - Solution value - Discreteness flag - Decimal precision - Tailored value ranges
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade - District - By Disability Status’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/3f98e6d8-4cb9-479f-adc4-27e3ffb8c504 on 12 November 2021.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade - School Level - By Race- Ethnicity’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/71ca4f16-2bb0-4bb3-bd98-e7d52161a5d7 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer. Prior to 2011, the mean scale scores for ‘All Grades’ were not calculated.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Metropolitan Lagos dataset consists of the files (i) tsetimi_lagos_dataset.sav and (ii) tsetimi_lagos_dataset.xlxs. The two files contain the same number of records (377) and same information. The first file is in IBM SPSS database format while the second is in Microsoft Excel spreadsheet format. The SPSS database format can be accessed in the data view of SPSS. The fieldnames, field descriptions and field types are self-contained in the SPSS database file.
The dataset is part of a nationwide survey on the problems associated with electricity distribution and generation in Nigeria. A pilot survey [1] of this research was conducted in Delta State South-South, Nigeria. The files for the pilot survey are available in [2]. The survey for the Lagos data set was conducted by means of a well-structured questionnaire administered by trained interviewers. The questionnaire for the research collected information on respondents’ bio-data, experience with the services of their distribution companies and observed problems on electricity distribution from the fieldwork. The perception ratings on the services of distributions companies from the electricity customers was on a five-point scale based on the following metrics adapted from [3]: i. Overall satisfaction with services of distribution company; ii. Quality and reliability of power from distribution company; iii. Reasonableness of bills from distribution company; iv. Billing system of distribution company; v. Corporate image of distribution company; vi. Effectiveness of Communication of distribution company with stakeholders; vii. Customers service of the distribution company. The respondents scored the metrics between 0 and 5 inclusive depending on their perception on the above metrics. The scores of the respondents on the observed problems were based on the following items listed below: i. Low voltage; ii. Incessant power outages; iii. Load Shedding; iv. Inadequate number of meters; v. Inadequate distribution lines; vi. Unreasonable price of power; vii. Illegal connections; viii. Inadequate number of transformers; ix. Stealing of Distribution facilities; The respondents assign a score between 0 and 10 inclusive depending on their perception on the level of severity of the observed problems.
References [1] J. Tsetimi, A. O. Atonuje and E. J. Mamadu. An Analysis of a Pilot Survey of the Problems of Electricity Distribution in Delta State, Nigeria. Transactions of Nigerian Institution of Mathematical Physics. 2020; 12(7): 109-116 [2] J. Tsetimi. Customers' Problems with Electricity Distribution in Delta State Nigeria, [dataset], Mendeley Data, V1, doi: 10.17632/msrhyv489k.1. 2020. Accessed 16th February, 2021. Available: http://dx.doi.org/10.17632/msrhyv489k.1 [3] D. Smith, S. Nayak, M. Karig, I. Kosnik, M. Konya, K. Lovett, Z. Liu, and H.Luvai. Assessing Residential Customer Satisfaction for Large Electric Utilities. UMSL, Department of Economics Working Papers. (2011).
English and maths (formerly Skills for Life) qualifications are designed to give people the reading, writing, maths and communication skills they need in everyday life, to operate effectively in work and to help them succeed on other training courses.
These data provide information on participation and achievements for English and maths qualifications and are broken down into a number of key reports.
If you need help finding data please refer to the table finder tool to search for specific breakdowns available for FE statistics.
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">10.9 MB</span></p>
<p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
<details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
Request an accessible format.
If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternative.formats@education.gov.uk" target="_blank" class="govuk-link">alternative.formats@education.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade 2006-2011 - District - All Students’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/699ac33d-2326-4ba7-b51c-a0cb70ea33e0 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer.
--- Original source retains full ownership of the source dataset ---
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087. Abstract The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 571,831 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. Data Description The dataset consists of a total of 571,831 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th November (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files. Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022) Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022) Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022) Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022) Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022) Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022) Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022) Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022) Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022) Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022) Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022) Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022) Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset, the Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) may be used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository accompanies the article “Rapid, model-driven design of organ-scale vascular networks for perfused bioprinted tissues.”
It contains every geometry, computational-fluid-dynamics (CFD) result, image analysis and post-processing script required to reproduce the figures in the main text and Supplementary Information. The data trace the complete workflow—from algorithmic vascular-tree construction with svv, through 0-D/3-D CFD analysis, to quantitative figure generation—and offer ready-to-use models for new design studies.
1. Data & File Structure
2. Materials & Methods
3. Sharing & Access
4. Code & Software
The archive unpacks into a single root folder:
Selected folders a files (input files and python code) are shown here:
adj6152_data/
Main/
Main Figures/
Figure 1/
cube/
heart/
Figure_1c.py
Figure_1d.py
Figure_1e.py
Figure_1f.py
Figure_1g.py
Figure 2/
Fig. 2C/
Fig. 2F/
Fig. 2H-I/
Figure 3/
anulus_0d_simulation/
timeseries/
timeseries_for_flow_gif/
timeseries_for_pressure_gif/
timeseries_for_wss_gif/
inflow.flow
plot_0d_results_at_slices.py
plot_0d_results_to_3d.py
run.py
solver_0d.in
wave.flow
cube_0d_simulation/
timeseries/
timeseries_for_flow_gif/
timeseries_for_pressure_gif/
timeseries_for_wss_gif/
inflow.flow
plot_0d_results_at_slices.py
plot_0d_results_to_3d.py
run.py
solver_0d.in
wave.flow
gyrus_0d_simulation/
timeseries/
timeseries_for_flow_gif/
timeseries_for_pressure_gif/
timeseries_for_wss_gif/
inflow.flow
plot_0d_results_at_slices.py
plot_0d_results_to_3d.py
run.py
solver_0d.in
wave.flow
heart_0d_simulation/
timeseries/
timeseries_for_flow_gif/
timeseries_for_pressure_gif/
timeseries_for_wss_gif/
inflow.flow
plot_0d_results_at_slices.py
plot_0d_results_to_3d.py
run.py
solver_0d.in
wave.flow
Figure 4/
Figure (Data)/
Fig 4.A/
Model 10000/
Model 100000/
Model 1000000/
Fig 4.B/
Model 10000/
Model 100000/
Model 1000000/
Fig 4.C - Biventricular Model/
Processed/
Raw/
Fig 4.D - Annulus Model/
Processed/
Raw/
Figure 5/
Figure (Data)/
Fig. 5D-F/
raw data.zip
Fig. 5G-J/
Vessel_Printing_65/
Images/
Meshes/
Models/
Paths/
ROMSimulations/
Segmentations/
Simulations/
cross_sections/
cross_sections_2/
cross_sections_3/
cross_sections_12/
cross_sections_22/
post_results_1/
post_results_2/
pulsatile_deformable_wall/
pulsatile_elastic_wall/
pulsatile_flow/
centerlines.txt
color_branches.py
create_graph.py
read_centerlines.py
slice_data.py
visualize_deformation.py
svFSI/
inflow.flow
simvascular.proj
multimaterial_test_network_65_vessels (1).txt
Fig. 5K/
Photos/
Fig. 5L/
Processes/
Fig. 5M/
Analysis/
Data (Raw, Processed)/
Annulus/
Raw, Reconstructed/
Annulus_Vasculature/
Photos/
Vasculature/
Vascular 1/
Raw/
Reconstructed/
Vascular 2/
Fig. 5O/
Photos/
Renderings/
Fig. 5P/
Fig. 5P i/
Photos/
Fig. 5P ii/
Analysis/
MATLAB/
Processed Data/
Fig. 5P iii/
Analysis/
Fig. 5P iv/
Analysis/
Fig. 5P v/
Analysis/
MATLAB/
Processed Data/
Re-Scaled Data (per cell counts)/
Fig. 5P vi/
Analysis/
MATLAB/
Processed Data/
Supplement Materials/
Supplementary Figures/
Fig. S1/
cube_opposite_diagonal_source_sink/
cube_same_side_diagonal_source_sink/
timeseries/
timeseries_for_flow_gif/
timeseries_for_pressure_gif/
timeseries_for_wss_gif/
inflow.flow
plot_0d_results_to_3d.py
README.txt
SF2.py
solver_0d.in
Fig. S2/
svcco/
array_vs_linked_list.py
Fig. S3/
BFGS/
COBYLA/
L-BFGS-B/
Nelder-Mead/
Newton-CG/
Powell/
SLSQP/
TNC/
trust-ncg/
get_data.py
test_optimizers.py
Fig. S4/
svcco/
global_pu_time.py
implicit_accuracy_number_patches.py
implicit_condition_number.py
Fig. S5/
brain_reconstructions/
brain_regions/
engineering_shapes/
svcco/
figure_code.py
figure_writer.py
Fig. S6/
svcco/
corner_recovery.py
patch_functions.py
Fig. S7/
svcco/
point_enclosure.py
Fig. S8/
brain_data/
convex_data/
svcco/
build_times_figure.py
Fig. S9/
Anterior Commissure/
output/
generate_tree.py
Branchium of Left Inferior Colliculus/
output/
generate_tree.py
Commissure of Fornix of Forebrain/
output/
generate_tree.py
Fourth Ventricle/
output/
generate_tree.py
Hypothalamus/
output/
generate_tree.py
Lamina Terminalis/
output/
generate_tree.py
Left Globus Pallidus/
output/
generate_tree.py
Left Inferior Frontal Gyrus/
output/
generate_tree.py
Left Internal Capsule/
output/
generate_tree.py
Left Olfactory Tract/
output/
generate_tree.py
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Math Students’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/janiobachmann/math-students on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This is a dataset from the UCI datasets repository. This dataset contains the final scores of students at the end of a math programs with several features that might or might not impact the future outcome of these students.
Please include this citation if you plan to use this database:
P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. [Web Link]
Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:
1 school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
2 sex - student's sex (binary: 'F' - female or 'M' - male)
3 age - student's age (numeric: from 15 to 22)
4 address - student's home address type (binary: 'U' - urban or 'R' - rural)
5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
11 reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
12 guardian - student's guardian (nominal: 'mother', 'father' or 'other')
13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
15 failures - number of past class failures (numeric: n if 1<=n<3, else 4)
16 schoolsup - extra educational support (binary: yes or no)
17 famsup - family educational support (binary: yes or no)
18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
19 activities - extra-curricular activities (binary: yes or no)
20 nursery - attended nursery school (binary: yes or no)
21 higher - wants to take higher education (binary: yes or no)
22 internet - Internet access at home (binary: yes or no)
23 romantic - with a romantic relationship (binary: yes or no)
24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
25 freetime - free time after school (numeric: from 1 - very low to 5 - very high)
26 goout - going out with friends (numeric: from 1 - very low to 5 - very high)
27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
29 health - current health status (numeric: from 1 - very bad to 5 - very good)
30 absences - number of school absences (numeric: from 0 to 93)
31 G1 - first period grade (numeric: from 0 to 20)
31 G2 - second period grade (numeric: from 0 to 20)
32 G3 - final grade (numeric: from 0 to 20, output target)
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2006 - 2011 NYS Math Test Results By Grade - School Level - By Gender’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/05c8e61a-0791-43dc-8696-701df3b23188 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
New York City Results on the New York State Mathematics Tests, Grades 3 - 8 Notes: As of 2006, the New York State Education Department expanded the ELA and mathematics testing programs to Grades 3-8. Previously, state tests were administered in Grades 4 and 8 and citywide tests were administered in Grades 3, 5, 6, and 7. In 2006, NYSED treated District 75 students as a distinct geographic district. For 2007-2011, District 75 students are represented in their home districts and boroughs. Spreadsheets for District and Borough do not include District 75 students in 2006. Starting in 2010, NYSED changed the scale score required to meet each of the proficiency levels, increasing the number of questions students needed to answer correctly to meet proficiency.
Rows are suppressed (noted with ‘s’) if the number of tested students was 5 or fewer. Prior to 2011, the mean scale scores for ‘All Grades’ were not calculated.
--- Original source retains full ownership of the source dataset ---
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Reducing eutrophication in surface water is a major environmental challenge in many countries around the world. In cold Canadian prairie agricultural regions, part of the eutrophication challenge arises during spring snowmelt when a significant portion of the total annual nutrient export occurs, and plant residues can act as a nutrient source instead of a sink. Although the total mass of nutrients released from various crop residues has been studied before, little research has been conducted to capture fine-timescale temporal dynamics of nutrient leaching from plant residues, and the processes have not been represented in water quality models. In this study, we measured the dynamics of P and N release from a cold-hardy perennial plant species, alfalfa (Medicago sativa L.), to meltwater after freeze–thaw through a controlled snowmelt experiment. Various winter conditions were simulated by exposing alfalfa residues to different numbers of freeze–thaw cycles (FTCs) of uniform magnitude prior to snowmelt. The monitored P and N dynamics showed that most nutrients were released during the initial stages of snowmelt (first 5 h) and that the magnitude of nutrient release was affected by the number of FTCs. A threshold of five FTCs was identified for a greater nutrient release, with plant residue contributing between 0.29 (NO3) and 9 (PO4) times more nutrients than snow. The monitored temporal dynamics of nutrient release were used to develop the first process-based predictive model controlled by three potentially measurable parameters that can be integrated into catchment water quality models to improve nutrient transport simulations during snowmelt.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The short-distance continuous diversion area plays a crucial role within mountainous urban expressway systems, significantly enhancing the efficiency of specialized road sections through capacity analysis. This study develops a capacity calculation model tailored to the diversion area’s unique characteristics and principal capacity-influencing factors. Initially, the research focuses on a specific short-distance continuous diversion area of a mountainous urban expressway, employing video trajectory tracking technology to gather trajectory data. This data serves as the basis for analyzing road and traffic characteristics. Subsequently, the model computes the capacity influenced by eight variables, including diversion point spacing and deceleration lane length, using VISSIM simulation experiments. A gray correlation analysis identifies key factors, which guide the establishment of the model’s fundamental structure through two-factor surface fitting results. Mathematical statistical methods are then applied to resolve the model’s parameters, culminating in a robust capacity calculation model. The findings reveal that diversion point spacing, along with primary and secondary diversion ratios, significantly influence capacity. Notably, the capacity exhibits a marked quadratic polynomial relationship with the primary diversion ratio and diversion point spacing, and a linear relationship with the secondary diversion ratio. The model’s validity is confirmed through a case study at the diversion area north of Huacun Interchange in Chongqing Municipality, where the discrepancy between calculated and actual capacities is under 5%, underscoring the model’s high accuracy. These results offer valuable theoretical and methodological support for the planning, design, and traffic management of diversion areas.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset is a comprehensive collection of mathematical word problems spanning multiple domains with rich metadata and natural language variations. The problems contain 1 - 5 steps of mathematical operations that are specifically designed to encourage showing work and maintaining appropriate decimal precision throughout calculations.
All the problems have never been seen before and are free from copyright restrictions.
The available data has 100,000 problems. To license the templating system that created the data for magnitudes more data or customizations like the number of mathematical steps involved, and the addition of domains. Contact hello@cephalopod.studio for more information.
Intended Uses: The data can be used in 4 areas: 1) Pretraining 2) Instruction tuning 3) Finetuning 4) Benchmarking existing models
All those areas are in service of: - Training mathematical reasoning systems - Developing step-by-step problem-solving capabilities - Testing arithmetic operations across diverse real-world contexts - Evaluating precision in decimal calculations
Limitations: - Currently English-only - Limited to specific mathematical operations - Template-based generation may introduce structural patterns - Focused on arithmetic operations with up to 5 numbers
The dataset contains 100,000 total problems:
Problems span multiple domains including: - Agriculture (soil temperature changes, etc.) - Athletics (training hours, distances, etc.) - Construction (elevation changes, work hours, etc.) - Culinary (cooking temperature changes, calories per serving, etc.) - Education (GPA changes, etc.) - Entertainment (show ratings, stage lighting, etc.) - Finance (stock prices, account balances, etc.)
Each example is provided in JSONL format with the following structure:
json
{
"id": "problem_X",
"question": "Text of the math problem",
"metadata": {
"discrete": boolean,
"domain": string,
"numbers": number[],
"object_type": string,
"solution": number,
"operators": string[],
"decimals": number
},
"answer": "Text of the step-by-step solution to the problem"
}
1. Finance (Account Management):
"Jack sets up 19 bank accounts for clients. First the total rises to be 2 times greater than before. Following that, another 4 accounts were added."
2. Agriculture (Grain Storage):
"Kevin oversees 14,457 metric tons of grain storage in the new concrete silo. In the beginning, the facility grows to 3 times its current measure of grain. Following that, the overall supply of grain grows by 1,514 tons. Then, Kevin divides the holdings evenly by 1 and continues operations with a single unit."
3. Temperature Monitoring:
"The ground temperature measures 5.48 degrees Celsius. First, the temperature values are adjusted to be 1/3.74 the value they were before. Next, a sensor calibration adjustment multiplies all readings by 2.07, and later the measurements need to be scaled down by 1/3.94 due to sensor calibration. Later, the measurements need to be scaled down by 1/2.54 due to sensor calibration, and after that the measurements need to be scaled down by 1/2.21 due to sensor calibration. What is the final soil temperature in degrees Celsius? Round your answer and any steps to 2 decimal places."
1. Finance (Account Management):
"Here's how we can solve this problem:
"19 accounts times 2 equals 38
"Addition step: 38 + 4 = 42 accounts
"Based on these steps, the answer is 42."
2. Agriculture (Grain Storage):
"Following these steps will give us the answer:
"Multiplication operation: 14,457 tons * 3 = 43,371
"Add 1514 to 43,371 tons: 44,885
"x) 44,885 x 1 = 44,885 tons
"Thus, we arrive at the answer: 44885.0."
3. Temperature Monitoring:
"We can find the answer like this:
"Division: 5.48 degrees ÷ 3.74 = (Note: rounding to 2 decimal places) about 1.47
"Multiplication operation: 1.47 degrees * 2.07 = (Note: rounding to 2 decimal places) approximately 3.04
"3.04 degrees ÷ 3.94 (Note: rounding to 2 decimal places) approximately 0.77
"0.77 degrees ÷ 2.54 (Note: rounding to 2 decimal places) approximately 0.30
"When 0.30 degrees are divided by 2.21, the result is (Note: rounding to 2 decimal places) about 0.14
"This means the final result is 0.14."
Each problem includes: - Unique problem ID - Natural language question text - Includes arithemetic operations involving decimals and integers, values that are positive and negative, and requirements for rounding to a specific number of decimal places. - Detailed metadata including: - Domain classification - Object types and units - Numerical values used - Mathematical operators - Solution value - Discreteness flag - Decimal precision - Tailored value ranges