91 datasets found

P
Mathematics Dataset Dataset
paperswithcode.com
library.toponeai.link
Updated Nov 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli (2024). Mathematics Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mathematics
Explore at:
Dataset updated
Nov 3, 2024
Authors
David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli
Description
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
Z
Dataset for "ConfSolv: Prediction of solute conformer free energies across a...
data.niaid.nih.gov
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frederik Sandfort (2023). Dataset for "ConfSolv: Prediction of solute conformer free energies across a range of solvents" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8292519
Explore at:
Dataset updated
Oct 25, 2023
Dataset provided by
Frederik Sandfort
Volker Settels
Kevin A. Spiekermann
Florence Vermeire
Philipp Eiden
William H. Green
Zipei Tan
Angiras Menon
Lagnajit Pattanaik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains three archives. The first archive, full_dataset.zip, contains geometries and free energies for nearly 44,000 solute molecules with almost 9 million conformers, in 42 different solvents. The geometries and gas phase free energies are computed using density functional theory (DFT). The solvation free energy for each conformer is computed using COSMO-RS and the solution free energies are computed using the sum of the gas phase free energies and the solvation free energies. The geometries for each solute conformer are provided as ASE_atoms_objects within a pandas DataFrame, found in the compressed file dft coords.pkl.gz within full_dataset.zip. The gas-phase energies, solvation free energies, and solution free energies are also provided as a pandas DataFrame in the compressed file free_energy.pkl.gz within full_dataset.zip. Ten example data splits for both random and scaffold split types are also provided in the ZIP archive for training models. Scaffold split index 0 is used to generate results in the corresponding publication. The second archive, refined_conf_search.zip, contains geometries and free energies for a representative sample of 28 solute molecules from the full dataset that were subject to a refined conformer search and thus had more conformers located. The format of the data is identical to full_dataset.zip. The third archive contains one folder for each solvent for which we have provided free energies in full_dataset.zip. Each folder contains the .cosmo file for every solvent conformer used in the COSMOtherm calculations, a dummy input file for the COSMOtherm calculations, and a CSV file that contains the electronic energy of each solvent conformer that needs to be substituted for "EH_Line" in the dummy input file.
Data from: GALILEO VENUS RANGE FIX RAW DATA V1.0
catalog.data.gov
datasets.ai
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2025). GALILEO VENUS RANGE FIX RAW DATA V1.0 [Dataset]. https://catalog.data.gov/dataset/galileo-venus-range-fix-raw-data-v1-0-0943a
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Raw radio tracking data used to determine the precise distance to Venus (and improve knowledge of the Astronomical Unit) from the Galileo flyby on 10 February 1990.
Fused Image dataset for convolutional neural Network-based crack Detection...
zenodo.org
explore.openaire.eu
+1more
zip
Updated Apr 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6383044
Dataset updated
Apr 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
h
HindiMathQuest
huggingface.co
Updated Oct 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dnyanesh Walwadkar (2024). HindiMathQuest [Dataset]. http://doi.org/10.57967/hf/3259
Explore at:
Unique identifier
https://doi.org/10.57967/hf/3259
Dataset updated
Oct 25, 2024
Authors
Dnyanesh Walwadkar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview:

The HindiMathQuest: A Dataset for Mathematical Reasoning and Problem-Solving in Hindi is designed to advance the capabilities of language models in understanding and solving mathematical problems presented in the Hindi language. The dataset covers a comprehensive range of question types, including logical reasoning, numeric calculations, translation-based problems, and complex mathematical tasks typically seen in competitive exams. This dataset is intended to fill a… See the full description on the dataset page: https://huggingface.co/datasets/dnyanesh/HindiMathQuest.
c
Elk Home Range - Laytonville - 2022-2023 [ds3083]
gis.data.ca.gov
data.ca.gov
+6more
Updated Mar 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2023). Elk Home Range - Laytonville - 2022-2023 [ds3083] [Dataset]. https://gis.data.ca.gov/items/af5d36786212430aae845ac33ae7de0b
Explore at:
Dataset updated
Mar 16, 2023
Dataset authored and provided by
California Department of Fish and Wildlife
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The project lead for the collection of this data was Carrington Hilson. Elk (1 adult female) were captured and equipped with GPS collars (Lotek Iridium) transmitting data from 2022-2023. The Laytonville herd does not migrate between traditional summer and winter seasonal ranges. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed between 1-7 hour intervals in the dataset. To improve the quality of the data set as per Bjørneraas et al. (2010), the GPS data were filtered prior to analysis to remove locations which were: i) further from either the previous point or subsequent point than an individual pronghorn is able to travel in the elapsed time, ii) forming spikes in the movement trajectory based on outgoing and incoming speeds and turning angles sharper than a predefined threshold , or iii) fixed in 2D space and visually assessed as a bad fix by the analyst. The methodology used for this migration analysis allowed for the mapping of the herd''s home range. Brownian bridge movement models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 1 elk, including 1 annual home range sequence, location, date, time, and average location error as inputs in Migration Mapper. BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of less than 27 hours. Home range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution. Home range designations for this herd may expand with a larger sample.
g
Elk Home Range - Sherwood - 2022-2023 [ds3086]
gimi9.com
data.ca.gov
+5more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elk Home Range - Sherwood - 2022-2023 [ds3086] [Dataset]. https://gimi9.com/dataset/california_elk-home-range-sherwood-2022-2023-ds3086/
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The project lead for the collection of this data was Carrington Hilson. Elk (2 adult females) were captured and equipped with GPS collars (Lotek Iridium) transmitting data from 2022-2023. The Sherwood herd does not migrate between traditional summer and winter seasonal ranges. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed between 1-7 hour intervals in the dataset. To improve the quality of the data set as per Bjørneraas et al. (2010), the GPS data were filtered prior to analysis to remove locations which were: i) further from either the previous point or subsequent point than an individual pronghorn is able to travel in the elapsed time, ii) forming spikes in the movement trajectory based on outgoing and incoming speeds and turning angles sharper than a predefined threshold , or iii) fixed in 2D space and visually assessed as a bad fix by the analyst. The methodology used for this migration analysis allowed for the mapping of the herd''s home range. Brownian bridge movement models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 2 elk, including 2 annual home range sequences, location, date, time, and average location error as inputs in Migration Mapper. BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of less then 27 hours. Home range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution. Home range designations for this herd may expand with a larger sample.
P
MML Dataset
paperswithcode.com
Updated Jan 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt (2025). MML Dataset [Dataset]. https://paperswithcode.com/dataset/mmlu
Explore at:
Dataset updated
Jan 5, 2025
Authors
Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt
Description
MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.
P
SI-HDR Dataset
paperswithcode.com
Updated Aug 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Param Hanji; Rafał K. Mantiuk; Gabriel Eilertsen; Saghi Hajisharif; Jonas Unger (2023). SI-HDR Dataset [Dataset]. https://paperswithcode.com/dataset/si-hdr
Explore at:
Dataset updated
Aug 12, 2023
Authors
Param Hanji; Rafał K. Mantiuk; Gabriel Eilertsen; Saghi Hajisharif; Jonas Unger
Description
The dataset consists of 181 HDR images. Each image includes: 1) a RAW exposure stack, 2) an HDR image, 3) simulated camera images at two different exposures 4) Results of 6 single-image HDR reconstruction methods: Endo et al. 2017, Eilertsen et al. 2017, Marnerides et al. 2018, Lee et al. 2018, Liu et al. 2020, and Santos et al. 2020

Project web page More details can be found at: https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr_benchmark/

Overview This dataset contains 181 RAW exposure stacks selected to cover a wide range of image content and lighting conditions. Each scene is composed of 5 RAW exposures and merged into an HDR image using the estimator that accounts photon noise 3. A simple color correction was applied using a reference white point and all merged HDR images were resized to 1920×1280 pixels.

The primary purpose of the dataset was to compare various single image HDR (SI-HDR) methods [1]. Thus, we selected a wide variety of content covering nature, portraits, cities, indoor and outdoor, daylight and night scenes. After merging and resizing, we simulated captures by applying a custom CRF and added realistic camera noise based on estimated noise parameters of Canon 5D Mark III.

The simulated captures were inputs to six selected SI-HDR methods. You can view the reconstructions of various methods for select scenes on our interactive viewer. For the remaining scenes, please download the appropriate zip files. We conducted a rigorous pairwise comparison experiment on these images to find that widely-used metrics did not correlate well with subjective data. We then proposed an improved evaluation protocol for SI-HDR [1].

If you find this dataset useful, please cite [1].

References [1] Param Hanji, Rafał K. Mantiuk, Gabriel Eilertsen, Saghi Hajisharif, and Jonas Unger. 2022. “Comparison of single image hdr reconstruction methods — the caveats of quality assessment.” In Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings (SIGGRAPH ’22 Conference Proceedings). [Online]. Available: https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr_benchmark/

[2] Gabriel Eilertsen, Saghi Hajisharif, Param Hanji, Apostolia Tsirikoglou, Rafał K. Mantiuk, and Jonas Unger. 2021. “How to cheat with metrics in single-image HDR reconstruction.” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. 3998–4007.

[3] Param Hanji, Fangcheng Zhong, and Rafał K. Mantiuk. 2020. “Noise-Aware Merging of High Dynamic Range Image Stacks without Camera Calibration.” In Advances in Image Manipulation (ECCV workshop). Springer, 376–391. [Online]. Available: https://www.cl.cam.ac.uk/research/rainbow/projects/noise-aware-merging/
Elk Home Range - Potter-Redwood Valley - 2023-2024 [ds3191]
gis.data.ca.gov
data.cnra.ca.gov
+4more
Updated Sep 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2024). Elk Home Range - Potter-Redwood Valley - 2023-2024 [ds3191] [Dataset]. https://gis.data.ca.gov/datasets/CDFW::elk-home-range-potter-redwood-valley-2023-2024-ds3191
Explore at:
Dataset updated
Sep 18, 2024
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The project lead for the collection of this data was Carrington Hilson. Elk (9 adult females) were captured and equipped with GPS collars (Lotek Iridium) transmitting data from 2023-2024. The Potter-Redwood Valley herd does not migrate between traditional summer and winter seasonal ranges. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed at 6.5 hour intervals in the dataset. To improve the quality of the data set, all points with DOP values greater than 5 and those points visually assessed as a bad fix by the analyst were removed. The methodology used for this migration analysis allowed for the mapping of the herd's home range. Brownian bridge movement models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 8 elk, including 15 annual home range sequences, location, date, time, and average location error as inputs in Migration Mapper. BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of less than 27 hours and a fixed motion variance of 1000. Home range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution. Home range designations for this herd may expand with a larger sample.
d
Elk Home Range - Lone Pine - 2023-2024 [ds3192]
catalog.data.gov
data.cnra.ca.gov
+6more
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2024). Elk Home Range - Lone Pine - 2023-2024 [ds3192] [Dataset]. https://catalog.data.gov/dataset/elk-home-range-lone-pine-2023-2024-ds3192-b3f1a
Explore at:
Dataset updated
Nov 27, 2024
Dataset provided by
California Department of Fish and Wildlife
Description
The project lead for the collection of this data was Carrington Hilson. Elk (2 adult females) were captured and equipped with GPS collars (Lotek Iridium) transmitting data from 2023-2024. The Lone Pine herd does not migrate between traditional summer and winter seasonal ranges. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed at 6.5 hour intervals in the dataset. To improve the quality of the data set, all points with DOP values greater than 5 and those points visually assessed as a bad fix by the analyst were removed. The methodology used for this migration analysis allowed for the mapping of the herd's home range. Brownian bridge movement models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 2 elk, including 2 annual home range sequences, location, date, time, and average location error as inputs in Migration Mapper. BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of less than 27 hours and a fixed motion variance of 1000. Home range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution. Home range designations for this herd may expand with a larger sample.
NIST Stopping-Power & Range Tables for Electrons, Protons, and Helium Ions -...
catalog.data.gov
data.amerigeoss.org
+1more
Updated Jul 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). NIST Stopping-Power & Range Tables for Electrons, Protons, and Helium Ions - SRD 124 [Dataset]. https://catalog.data.gov/dataset/nist-stopping-power-range-tables-for-electrons-protons-and-helium-ions-srd-124-b3661
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The databases ESTAR, PSTAR, and ASTAR calculate stopping-power and range tables for electrons, protons, or helium ions. Stopping-power and range tables can be calculated for electrons in any user-specified material and for protons and helium ions in 74 materials.
Elk Home Range - Sultan - 2022-2023 [ds3087]
data.ca.gov
data.cnra.ca.gov
+2more
Updated Mar 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2023). Elk Home Range - Sultan - 2022-2023 [ds3087] [Dataset]. https://data.ca.gov/dataset/elk-home-range-sultan-2022-2023-ds3087
Explore at:
arcgis geoservices rest api, geojson, csv, html, zip, kmlAvailable download formats
Dataset updated
Mar 16, 2023
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The project lead for the collection of this data was Carrington Hilson. Elk (2 adult females) were captured and equipped with GPS collars (Lotek Iridium) transmitting data from 2022-2023. The Sultan herd does not migrate between traditional summer and winter seasonal ranges. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed between 1-7 hour intervals in the dataset. To improve the quality of the data set as per Bjørneraas et al. (2010), the GPS data were filtered prior to analysis to remove locations which were: i) further from either the previous point or subsequent point than an individual pronghorn is able to travel in the elapsed time, ii) forming spikes in the movement trajectory based on outgoing and incoming speeds and turning angles sharper than a predefined threshold , or iii) fixed in 2D space and visually assessed as a bad fix by the analyst. The methodology used for this migration analysis allowed for the mapping of the herd''s home range. Brownian bridge movement models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 2 elk, including 2 annual home range sequence, location, date, time, and average location error as inputs in Migration Mapper. BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of less than 27 hours. Home range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution. Home range designations for this herd may expand with a larger sample.
F
Urdu Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Urdu Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/urdu-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Welcome to the Urdu Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems. Dataset Content: This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Urdu language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more. Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Urdu people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references. Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format. Prompt Diversity: To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others. These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments. Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers. Data Format and Annotation Details: This fully labeled Urdu Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence. Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance. The Urdu version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset. Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options. License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Urdu Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
a
Elk Home Range - Lake Pillsbury - 2017-2022 [ds3028]
data-cdfw.opendata.arcgis.com
data.cnra.ca.gov
+4more
Updated Nov 14, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2022). Elk Home Range - Lake Pillsbury - 2017-2022 [ds3028] [Dataset]. https://data-cdfw.opendata.arcgis.com/datasets/CDFW::elk-home-range-lake-pillsbury-2017-2022-ds3028
Explore at:
Dataset updated
Nov 14, 2022
Dataset authored and provided by
California Department of Fish and Wildlife
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The project leads for the collection of this data were Josh Bush and Tom Batter. Elk (5 adult females, 7 adult males) from the Lake Pillsbury herd were captured and equipped with Lotek GPS collars (LifeCycle 800 GlobalStar, Lotek Wireless, Newmarket, Ontario, Canada), transmitting data from 2017-2022. The study area was within the Lake Pillsbury Elk Management Unit, north of Clear Lake and located entirely within the Mendocino National Forest. The Lake Pillsbury herd contains short distance, elevation-based movements likely due to seasonal habitat conditions, but this herd does not migrate between traditional summer and winter seasonal ranges. Instead, much of the herd displays a residential pattern, slowly moving up or down elevational gradients. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed at 13-hour intervals in the dataset. To improve the quality of the data set as per Bjørneraas et al. (2010), the GPS data were filtered prior to analysis to remove locations which were: i) further from either the previous point or subsequent point than an individual elk is able to travel in the elapsed time, ii) forming spikes in the movement trajectory based on outgoing and incoming speeds and turning angles sharper than a predefined threshold , or iii) fixed in 2D space and visually assessed as a bad fix by the analyst. The methodology used for this analysis allowed for the mapping of the herd''s annual range based on a small sample. Brownian Bridge Movement Models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 11 elk in total, including 36 year-long sequences, location, date, time, and average location error as inputs in Migration Mapper to assess annual range. Annual range BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of les than 27 hours. Population-level annual range designations for this herd may expand with a larger sample, filling in some of the gaps between high-use annual range polygons in the map. Annual range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution.
P
DOLPHINS Dataset
paperswithcode.com
Updated Jul 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruiqing Mao; Jingyu Guo; Yukuan Jia; Yuxuan Sun; Sheng Zhou; Zhisheng Niu (2022). DOLPHINS Dataset [Dataset]. https://paperswithcode.com/dataset/dolphins
Explore at:
Dataset updated
Jul 18, 2022
Authors
Ruiqing Mao; Jingyu Guo; Yukuan Jia; Yuxuan Sun; Sheng Zhou; Zhisheng Niu
Description
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving, as a new simulated large-scale various-scenario multi-view multi-modality autonomous driving dataset, which provides a ground-breaking benchmark platform for interconnected autonomous driving. DOLPHINS outperforms current datasets in six dimensions: temporally-aligned images and point clouds from both vehicles and Road Side Units (RSUs) enabling both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based collaborative perception; 6 typical scenarios with dynamic weather conditions make the most various interconnected autonomous driving dataset; meticulously selected viewpoints providing full coverage of the key areas and every object; 42376 frames and 292549 objects, as well as the corresponding 3D annotations, geo-positions, and calibrations, compose the largest dataset for collaborative perception; Full-HD images and 64-line LiDARs construct high-resolution data with sufficient details; well-organized APIs and open-source codes ensure the extensibility of DOLPHINS. We also construct a benchmark of 2D detection, 3D detection, and multi-view collaborative perception tasks on DOLPHINS. The experiment results show that the raw-level fusion scheme through V2X communication can help to improve the precision as well as to reduce the necessity of expensive LiDAR equipment on vehicles when RSUs exist, which may accelerate the popularity of interconnected self-driving vehicles.
MathInstruct Dataset: Hybrid Math Instruction
kaggle.com
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). MathInstruct Dataset: Hybrid Math Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/mathinstruct-dataset-hybrid-math-instruction-tun/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

By TIGER-Lab (From Huggingface) [source]

About this dataset

MathInstruct is a comprehensive and meticulously curated dataset specifically designed to facilitate the development and evaluation of models for math instruction tuning. This dataset consists of a total of 13 different math rationale datasets, out of which six have been exclusively curated for this project, ensuring a diverse range of instructional materials. The main objective behind creating this dataset is to provide researchers with an easily accessible and manageable resource that aids in enhancing the effectiveness and precision of math instruction.

One noteworthy feature of MathInstruct is its lightweight nature, making it highly convenient for researchers to utilize without any hassle. With carefully selected columns such as source, source, output, output, users can readily identify the origin or reference material from where the math instruction was obtained. Additionally, they can also refer to the expected output or solution corresponding to each specific math problem or exercise.

Overall, MathInstruct offers immense potential in refining hybrid math instruction by facilitating meticulous model development and rigorous evaluation processes. Researchers can leverage this diverse dataset to gain deeper insights into effective teaching methodologies while exploring innovative approaches towards enhancing mathematical learning experiences

How to use the dataset

Title: How to Use the MathInstruct Dataset for Hybrid Math Instruction Tuning

Introduction: The MathInstruct dataset is a comprehensive collection of math instruction examples, designed to assist in developing and evaluating models for math instruction tuning. This guide will provide an overview of the dataset and explain how to make effective use of it.

Understanding the Dataset Structure: The dataset consists of a file named train.csv. This CSV file contains the training data, which includes various columns such as source and output. The source column represents the source of math instruction (textbook, online resource, or teacher), while the output column represents expected output or solution to a particular math problem or exercise.

Accessing the Dataset: To access the MathInstruct dataset, you can download it from Kaggle's website. Once downloaded, you can read and manipulate the data using programming languages like Python with libraries such as pandas.

Exploring the Columns: a) Source Column: The source column provides information about where each math instruction comes from. It may include references to specific textbooks, online resources, or even teachers who provided instructional material. b) Output Column: The output column specifies what students are expected to achieve as a result of each math instruction. It contains solutions or expected outputs for different math problems or exercises.

Utilizing Source Information: By analyzing the different sources mentioned in this dataset, researchers can understand which instructional materials are more effective in teaching specific topics within mathematics. They can also identify common strategies used by teachers across multiple sources.

Analyzing Expected Outputs: Researchers can study variations in expected outputs for similar types of problems across different sources. This analysis may help identify differences in approaches across textbooks/resources and enrich our understanding of various teaching methods.

Model Development and Evaluation: Researchers can utilize this dataset to develop machine learning models that automatically assess whether a given math instruction leads to the expected output. By training models on this data, one can create automated systems that provide feedback on math problems or suggest alternative instruction sources.

Scaling the Dataset: Due to its lightweight nature, the MathInstruct dataset is easily accessible and manageable. Researchers can scale up their training data by combining it with other instructional datasets or expand it further by labeling more examples based on similar guidelines.

Conclusion: The MathInstruct dataset serves as a valuable resource for developing and evaluating models related to math instruction tuning. By analyzing the source information and expected outputs, researchers can gain insights into effective teaching methods and build automated assessment

Research Ideas

Model development: This dataset can be used for developing and training models for math instruction...
d
Data from: Haploids adapt faster than diploids across a range of...
datadryad.org
borealisdata.ca
+4more
zip
Updated Dec 7, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleeza C Gerstein; Lesley A Cleathero; Mohammad A Mandegar; Sarah P. Otto (2010). Haploids adapt faster than diploids across a range of environments [Dataset]. http://doi.org/10.5061/dryad.8048
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8048
Dataset updated
Dec 7, 2010
Dataset provided by
Dryad
Authors
Aleeza C Gerstein; Lesley A Cleathero; Mohammad A Mandegar; Sarah P. Otto
Time period covered
2010
Description
Raw data to calculate rate of adaptationRaw dataset for rate of adaptation calculations (Figure 1) and related statistics.dataall.csvR code to analyze raw data for rate of adaptationCompetition Analysis.RRaw data to calculate effective population sizesdatacount.csvR code to analayze effective population sizesR code used to analyze effective population sizes; Figure 2Cell Count Ne.RR code to determine our best estimate of the dominance coefficient in each environmentR code to produce figures 3, S4, S5 -- what is the best estimate of dominance? Note, competition and effective population size R code must be run first in the same session.what is h.R
GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034
s.cnmilf.com
search.dataone.org
+4more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA NSIDC DAAC (2025). GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/glas-icesat-l1b-global-waveform-based-range-corrections-data-hdf5-v034-62528
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
National Snow and Ice Data Center
NASAhttp://nasa.gov/
Description
GLAH05 Level-1B waveform parameterization data include output parameters from the waveform characterization procedure and other parameters required to calculate surface slope and relief characteristics. GLAH05 contains parameterizations of both the transmitted and received pulses and other characteristics from which elevation and footprint-scale roughness and slope are calculated. The received pulse characterization uses two implementations of the retracking algorithms: one tuned for ice sheets, called the standard parameterization, used to calculate surface elevation for ice sheets, oceans, and sea ice; and another for land (the alternative parameterization). Each data granule has an associated browse product.
F
Portuguese Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Portuguese Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/portuguese-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
Welcome to the Portuguese Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems. Dataset Content: This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Portuguese language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more. Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Portuguese people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references. Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format. Prompt Diversity: To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others. These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments. Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers. Data Format and Annotation Details: This fully labeled Portuguese Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence. Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance. The Portuguese version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset. Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options. License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Portuguese Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

Facebook

Twitter

Click to copy link

Link copied

Cite

David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli (2024). Mathematics Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mathematics

Mathematics Dataset Dataset

Explore at:

334 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 3, 2024

Authors

David Saxton; Edward Grefenstette; Felix Hill; Pushmeet Kohli

Description

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

Clear search

Close search

Google apps

Main menu

Mathematics Dataset Dataset

Dataset for "ConfSolv: Prediction of solute conformer free energies across a...

Data from: GALILEO VENUS RANGE FIX RAW DATA V1.0

Fused Image dataset for convolutional neural Network-based crack Detection...

HindiMathQuest

Elk Home Range - Laytonville - 2022-2023 [ds3083]

Elk Home Range - Sherwood - 2022-2023 [ds3086]

MML Dataset

SI-HDR Dataset

Elk Home Range - Potter-Redwood Valley - 2023-2024 [ds3191]

Elk Home Range - Lone Pine - 2023-2024 [ds3192]

NIST Stopping-Power & Range Tables for Electrons, Protons, and Helium Ions -...

Elk Home Range - Sultan - 2022-2023 [ds3087]

Urdu Chain of Thought Prompt & Response Dataset

Elk Home Range - Lake Pillsbury - 2017-2022 [ds3028]

DOLPHINS Dataset

MathInstruct Dataset: Hybrid Math Instruction

MathInstruct Dataset: Hybrid Math Instruction Tuning

A curated dataset for math instruction tuning models

About this dataset

How to use the dataset

Research Ideas

Data from: Haploids adapt faster than diploids across a range of...

GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034

Portuguese Chain of Thought Prompt & Response Dataset

Mathematics Dataset DatasetSee More Versions

Mathematics Dataset Dataset