100+ datasets found

d
Open Data Portal and Data & Insights Training
datasets.ai
catalog.data.gov
Updated Sep 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of Maryland (2024). Open Data Portal and Data & Insights Training [Dataset]. https://datasets.ai/datasets/socrata-and-open-data-portal-training
Explore at:
Dataset updated
Sep 26, 2024
Dataset authored and provided by
State of Maryland
Description
For newcomers to the opendata.maryland.gov site, gopi.data.socrata.com, and performance.maryland.gov, this page provides some insight and training into navigating these portals and how to effectively use Data & Insights, these sites' data management tool.
h
Lucie-Training-Dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenLLM France, Lucie-Training-Dataset [Dataset]. https://huggingface.co/datasets/OpenLLM-France/Lucie-Training-Dataset
Explore at:
Dataset authored and provided by
OpenLLM France
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Lucie Training Dataset Card

The Lucie Training Dataset is a curated collection of text data in English, French, German, Spanish and Italian culled from a variety of sources including: web data, video subtitles, academic papers, digital books, newspapers, and magazines, some of which were processed by Optical Character Recognition (OCR). It also contains samples of diverse programming languages. The Lucie Training Dataset was used to pretrain Lucie-7B, a foundation LLM with strong… See the full description on the dataset page: https://huggingface.co/datasets/OpenLLM-France/Lucie-Training-Dataset.
Llama-Nemotron-Post-Training-Dataset-v1
huggingface.co
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). Llama-Nemotron-Post-Training-Dataset-v1 [Dataset]. https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1
Explore at:
Dataset updated
Mar 18, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Llama-Nemotron-Post-Training-Dataset-v1 Release

Data Overview

This dataset is a compilation of SFT and RL data that supports improvements of math, code, general reasoning, and instruction following capabilities of the original Llama instruct model, in support of NVIDIA’s release of Llama-3.3-Nemotron-Super-49B-v1 and Llama-3.1-Nemotron-Nano-8B-v1. Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) which is a derivative of Meta’s Llama-3.3-70B-Instruct (AKA… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1.
B
Open Data Training Workshop: Case Studies in Open Data for Qualitative and...
borealisdata.ca
Updated Apr 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Srinvivas Murthy; Maggie Woo Kinshella; Jessica Trawin; Teresa Johnson; Niranjan Kissoon; Matthew Wiens; Gina Ogilvie; Gurm Dhugga; J Mark Ansermino (2023). Open Data Training Workshop: Case Studies in Open Data for Qualitative and Quantitative Clinical Research [Dataset]. http://doi.org/10.5683/SP3/BNNAE7
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/BNNAE7
Dataset updated
Apr 18, 2023
Dataset provided by
Borealis
Authors
Srinvivas Murthy; Maggie Woo Kinshella; Jessica Trawin; Teresa Johnson; Niranjan Kissoon; Matthew Wiens; Gina Ogilvie; Gurm Dhugga; J Mark Ansermino
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset funded by
Digital Research Alliance of Canada
Description
Objective(s): Momentum for open access to research is growing. Funding agencies and publishers are increasingly requiring researchers make their data and research outputs open and publicly available. However, clinical researchers struggle to find real-world examples of Open Data sharing. The aim of this 1 hr virtual workshop is to provide real-world examples of Open Data sharing for both qualitative and quantitative data. Specifically, participants will learn: 1. Primary challenges and successes when sharing quantitative and qualitative clinical research data. 2. Platforms available for open data sharing. 3. Ways to troubleshoot data sharing and publish from open data. Workshop Agenda: 1. “Data sharing during the COVID-19 pandemic” - Speaker: Srinivas Murthy, Clinical Associate Professor, Department of Pediatrics, Faculty of Medicine, University of British Columbia. Investigator, BC Children's Hospital 2. “Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project.” - Speaker: Maggie Woo Kinshella, Global Health Research Coordinator, Department of Obstetrics and Gynaecology, BC Children’s and Women’s Hospital and University of British Columbia This workshop draws on work supported by the Digital Research Alliance of Canada. Data Description: Presentation slides, Workshop Video, and Workshop Communication Srinivas Murthy: Data sharing during the COVID-19 pandemic presentation and accompanying PowerPoint slides. Maggie Woo Kinshella: Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project presentation and accompanying Powerpoint slides. This workshop was developed as part of Dr. Ansermino's Data Champions Pilot Project supported by the Digital Research Alliance of Canada. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."
d
Golf Courses
catalog.data.gov
data.ny.gov
+2more
Updated Dec 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of New York (2024). Golf Courses [Dataset]. https://catalog.data.gov/dataset/golf-courses-74782
Explore at:
Dataset updated
Dec 20, 2024
Dataset provided by
State of New York
Description
The New York State Office of Parks, Recreation and Historic Preservation (OPRHP) oversees more than 250 state parks, historic sites, recreational trails, golf courses, boat launches and more, encompassing nearly 350,000 acres, that are visited by 74 million people annually. These facilities contribute to the economic vitality and quality of life of local communities and directly support New York’s tourism industry. Parks also provide a place for families and children to be active and exercise, promoting healthy lifestyles. The agency is responsible for the operation and stewardship of the state park system as well as advancing a statewide parks, historic preservation, and open space mission. From the famed Bethpage Black, to the rolling terrain of the Robert Trent Jones' designed 18-hole course at Green Lakes State Park, New York's state park golf courses rank among the best public courses in the world. For more information, visit http://nysparks.com/golf-courses/
Golf Courses
catalog.data.gov
s.cnmilf.com
+2more
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle ArcGIS Online (2025). Golf Courses [Dataset]. https://catalog.data.gov/dataset/golf-courses-6a22b
Explore at:
Dataset updated
Mar 8, 2025
Dataset provided by
Description
Seattle Parks and Recreation Golf Course locations. SPR Golf Courses are managed by contractors.Refresh Cycle: WeeklyFeature Class: DPR.GolfCourse
Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human...
data.nist.gov
cloud.csiss.gmu.edu
+1more
Updated Oct 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian DeCost (2020). Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models [Dataset]. http://doi.org/10.18434/mds2-2301
Explore at:
Unique identifier
https://doi.org/10.18434/mds2-2301, https://identifiers.org/ark:/88434/mds2-2301
Dataset updated
Oct 23, 2020
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Authors
Brian DeCost
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations. Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.
d
Training dataset for NABat Machine Learning V1.0
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Training dataset for NABat Machine Learning V1.0 [Dataset]. https://catalog.data.gov/dataset/training-dataset-for-nabat-machine-learning-v1-0
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.
o
University SET data, with faculty and courses characteristics
openicpsr.org
Updated Sep 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
Explore at:
Unique identifier
https://doi.org/10.3886/E149801V1
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
D
Danish Golf Courses Orthophotos Dataset
datasetninja.com
Updated Oct 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesper Kjærgaard Mortensen; Vinicius Soares Matthiesen; Jacobo Gonzalez de Frutos (2023). Danish Golf Courses Orthophotos Dataset [Dataset]. https://datasetninja.com/danish-golf-courses-orthophotos
Explore at:
Dataset updated
Oct 18, 2023
Dataset provided by
Dataset Ninja
Authors
Jesper Kjærgaard Mortensen; Vinicius Soares Matthiesen; Jacobo Gonzalez de Frutos
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Denmark
Description
The authors of the Danish Golf Courses Orthophotos dataset present a system designed to assist golf course raters in evaluating the difficulty rating of golf holes. Traditionally, this process involves time-consuming manual measurements on the golf course, which the authors aim to automate partially. They achieve this by training a U-net neural network to classify various elements on golf courses, including green, fairway, tee, bunker, and water bodies. This system helps measure distances between relevant course components. Notably, prior to this work, there were no publicly available datasets containing golf course data. To address this gap, the authors introduce a new public dataset of golf courses created from orthophotos. This dataset comprises 1,123 RGB orthophotos for training and validation, along with 108 RGB orthophotos for testing, all collected from 107 Danish golf courses during the spring season. The dataset includes manual annotations.
d
Community College Course Listing
catalog.data.gov
data.oregon.gov
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Community College Course Listing [Dataset]. https://catalog.data.gov/dataset/community-college-course-listing
Explore at:
Dataset updated
Mar 8, 2025
Dataset provided by
data.oregon.gov
Description
Oregon’s 17 Community Colleges cover 60 campuses and centers throughout the state. This dataset includes the current list of all courses offered by a community college. Course information is subject to change; please contact your local community college for course confirmation.
O
BUTTER - Empirical Deep Learning Dataset
data.openei.org
datasets.ai
+2more
code, data, website
Updated May 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek (2022). BUTTER - Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/1872441
Explore at:
code, website, dataAvailable download formats
Unique identifier
https://doi.org/10.25984/1872441
Dataset updated
May 20, 2022
Dataset provided by
Open Energy Data Initiative (OEDI)
National Renewable Energy Laboratory
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
Authors
Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata
datarade.ai
.csv
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WIRESTOCK (2023). Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata [Dataset]. https://datarade.ai/data-products/wirestock-s-ai-ml-image-training-data-4-5m-files-with-metadata-wirestock
Explore at:
.csvAvailable download formats
Dataset updated
Jul 18, 2023
Dataset provided by
Wirestock
Authors
WIRESTOCK
Area covered
Chile, Estonia, Peru, Pakistan, Jersey, Swaziland, Sudan, Belarus, New Caledonia, Georgia
Description
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata: This data product is a unique offering in the realm of AI/ML training data. What sets it apart is the sheer volume and diversity of the dataset, which includes 4.5 million files spanning across 20 different categories. These categories range from Animals/Wildlife and The Arts to Technology and Transportation, providing a rich and varied dataset for AI/ML applications.

The data is sourced from Wirestock's platform, where creators upload and sell their photos, videos, and AI art online. This means that the data is not only vast but also constantly updated, ensuring a fresh and relevant dataset for your AI/ML needs. The data is collected in a GDPR-compliant manner, ensuring the privacy and rights of the creators are respected.

The primary use-cases for this data product are numerous. It is ideal for training machine learning models for image recognition, improving computer vision algorithms, and enhancing AI applications in various industries such as retail, healthcare, and transportation. The diversity of the dataset also means it can be used for more niche applications, such as training AI to recognize specific objects or scenes.

This data product fits into Wirestock's broader data offering as a key resource for AI/ML training. Wirestock is a platform for creators to sell their work, and this dataset is a collection of that work. It represents the breadth and depth of content available on Wirestock, making it a valuable resource for any company working with AI/ML.

The core benefits of this dataset are its volume, diversity, and quality. With 4.5 million files, it provides a vast resource for AI training. The diversity of the dataset, spanning 20 categories, ensures a wide range of images for training purposes. The quality of the images is also high, as they are sourced from creators selling their work on Wirestock.

In terms of how the data is collected, creators upload their work to Wirestock, where it is then sold on various marketplaces. This means the data is sourced directly from creators, ensuring a diverse and unique dataset. The data includes both the images themselves and associated metadata, providing additional context for each image.

The different image categories included in this dataset are Animals/Wildlife, The Arts, Backgrounds/Textures, Beauty/Fashion, Buildings/Landmarks, Business/Finance, Celebrities, Education, Emotions, Food Drinks, Holidays, Industrial, Interiors, Nature Parks/Outdoor, People, Religion, Science, Signs/Symbols, Sports/Recreation, Technology, Transportation, Vintage, Healthcare/Medical, Objects, and Miscellaneous. This wide range of categories ensures a diverse dataset that can cater to a variety of AI/ML applications.
H
TRAINING DATASET: Hands-On Uploading Data (Download This File)
opendata.hawaii.gov
data.wu.ac.at
xls
Updated Sep 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training (2020). TRAINING DATASET: Hands-On Uploading Data (Download This File) [Dataset]. https://opendata.hawaii.gov/dataset/training-dataset-hands-on-uploading-data-download-this-file
Explore at:
xlsAvailable download formats
Dataset updated
Sep 23, 2020
Dataset authored and provided by
Training
Description
TRAINING DATASET: Hands-On Uploading Data (Download This File)
C
Golf Courses
phoenixopendata.com
mapping-phoenix.opendata.arcgis.com
+1more
Updated Jan 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enterprise (2022). Golf Courses [Dataset]. https://www.phoenixopendata.com/dataset/golf-courses
Explore at:
arcgis geoservices rest api, geojson, csv, html, zip, kmlAvailable download formats
Dataset updated
Jan 13, 2022
Dataset provided by
City of Phoenix
Authors
Enterprise
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Phoenix Golf Features:

Five 18-hole championship courses

Three 9-hole courses

Conveniently located throughout the city

Overseeded annually for optimum playing conditions

Full-service golf shops and restaurants

Full amenity practice facilities

Equipment rentals

PGA/LPGA professional course managers
B
Open Data Training Video: A proposed data de-identification framework for...
borealisdata.ca
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alishah Mawji; Holly Longstaff; Jessica Trawin; Clare Komugisha; Stefanie K. Novakowski; Matt Wiens; Samuel Akech; Abner Tagoola; Niranjan Kissoon; Mark J. Ansermino (2023). Open Data Training Video: A proposed data de-identification framework for clinical research [Dataset]. http://doi.org/10.5683/SP3/7XYZVC
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/7XYZVC
Dataset updated
Mar 15, 2023
Dataset provided by
Borealis
Authors
Alishah Mawji; Holly Longstaff; Jessica Trawin; Clare Komugisha; Stefanie K. Novakowski; Matt Wiens; Samuel Akech; Abner Tagoola; Niranjan Kissoon; Mark J. Ansermino
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Objective(s): Data sharing has enormous potential to accelerate and improve the accuracy of research, strengthen collaborations, and restore trust in the clinical research enterprise. Nevertheless, there remains reluctancy to openly share raw data sets, in part due to concerns regarding research participant confidentiality and privacy. We provide an instructional video to describe a standardized de-identification framework that can be adapted and refined based on specific context and risks. Data Description: Training video, presentation slides. Related Resources: The data de-identification algorithm, dataset, and data dictionary that correspond with this training video are available through the Smart Triage sub-Dataverse. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."
h
dataset-preferences-llm-course-full-dataset
huggingface.co
Updated May 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dataset-preferences-llm-course-full-dataset [Dataset]. https://huggingface.co/datasets/davanstrien/dataset-preferences-llm-course-full-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2024
Authors
Daniel van Strien
Description
Dataset Card for dataset-preferences-llm-course-full-dataset

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/davanstrien/dataset-preferences-llm-course-full-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/dataset-preferences-llm-course-full-dataset.
d
Data from: Evaluating Promising Practices in Undergraduate STEM Lecture...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Feb 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Warschauer; Lynn Reimer; Kameryn Denaro; Gabe Orona; Katerina Schenke; Tutrang Nguyen; Amanda Niili; Di Xu; Sabrina Solanki; Tate Tamara (2021). Evaluating Promising Practices in Undergraduate STEM Lecture Courses [Dataset]. http://doi.org/10.7280/D11M5Q
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.7280/D11M5Q
Dataset updated
Feb 21, 2021
Dataset provided by
Dryad
Authors
Mark Warschauer; Lynn Reimer; Kameryn Denaro; Gabe Orona; Katerina Schenke; Tutrang Nguyen; Amanda Niili; Di Xu; Sabrina Solanki; Tate Tamara
Time period covered
2021
Description
Papers linked may not use the full dataset.
Data from: Distance Education Courses for Public Elementary and Secondary...
catalog.data.gov
data.wu.ac.at
Updated Dec 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for Education Statistics (NCES) (2023). Distance Education Courses for Public Elementary and Secondary School Students, 2009-10 [Dataset]. https://catalog.data.gov/dataset/distance-education-courses-for-public-elementary-and-secondary-school-students-2009-10-c10db
Explore at:
Dataset updated
Dec 18, 2023
Dataset provided by
National Center for Education Statisticshttps://nces.ed.gov/
Description
The Distance Education Courses for Public Elementary and Secondary School Students, 2009-10 (FRSS 98), is a study that is part of the Fast Response Survey System (FRSS) program; program data is available since 1998-99 at . FRSS 98 (https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2012009) is a sample survey that provides national estimates on distance education courses in public school districts, including enrollment in distance education courses, how districts monitor these courses, the motivations for providing distance education, and the technologies used for delivering distance education. The study was conducted using surveys via the web or by mail. District superintendents were sampled. The study's weighted response rate was 95%. Key statistics produced from FRSS 98 include the types of distance education courses taken by students, whether the district plans to expand the number of distance education courses, and the technologies used for delivering distance education.
ChatQA-Training-Data
huggingface.co
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2023
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Data Description

We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.

Facebook

Twitter

Click to copy link

Link copied

Cite

State of Maryland (2024). Open Data Portal and Data & Insights Training [Dataset]. https://datasets.ai/datasets/socrata-and-open-data-portal-training

Open Data Portal and Data & Insights Training

Explore at:

Dataset updated

Sep 26, 2024

Dataset authored and provided by

State of Maryland

Description

For newcomers to the opendata.maryland.gov site, gopi.data.socrata.com, and performance.maryland.gov, this page provides some insight and training into navigating these portals and how to effectively use Data & Insights, these sites' data management tool.

Clear search

Close search

Google apps

Main menu

Open Data Portal and Data & Insights Training

Lucie-Training-Dataset

Llama-Nemotron-Post-Training-Dataset-v1

Open Data Training Workshop: Case Studies in Open Data for Qualitative and...

Golf Courses

Golf Courses

Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human...

Training dataset for NABat Machine Learning V1.0

University SET data, with faculty and courses characteristics

Danish Golf Courses Orthophotos Dataset

Community College Course Listing

BUTTER - Empirical Deep Learning Dataset

Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata

TRAINING DATASET: Hands-On Uploading Data (Download This File)

Golf Courses

Open Data Training Video: A proposed data de-identification framework for...

dataset-preferences-llm-course-full-dataset

Data from: Evaluating Promising Practices in Undergraduate STEM Lecture...

Data from: Distance Education Courses for Public Elementary and Secondary...

ChatQA-Training-Data

Open Data Portal and Data & Insights TrainingSee More Versions

Open Data Portal and Data & Insights Training