100+ datasets found
  1. d

    Open Data Portal and Data & Insights Training

    • datasets.ai
    • catalog.data.gov
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Maryland (2024). Open Data Portal and Data & Insights Training [Dataset]. https://datasets.ai/datasets/socrata-and-open-data-portal-training
    Explore at:
    Dataset updated
    Sep 26, 2024
    Dataset authored and provided by
    State of Maryland
    Description

    For newcomers to the opendata.maryland.gov site, gopi.data.socrata.com, and performance.maryland.gov, this page provides some insight and training into navigating these portals and how to effectively use Data & Insights, these sites' data management tool.

  2. h

    Lucie-Training-Dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenLLM France, Lucie-Training-Dataset [Dataset]. https://huggingface.co/datasets/OpenLLM-France/Lucie-Training-Dataset
    Explore at:
    Dataset authored and provided by
    OpenLLM France
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Lucie Training Dataset Card

    The Lucie Training Dataset is a curated collection of text data in English, French, German, Spanish and Italian culled from a variety of sources including: web data, video subtitles, academic papers, digital books, newspapers, and magazines, some of which were processed by Optical Character Recognition (OCR). It also contains samples of diverse programming languages. The Lucie Training Dataset was used to pretrain Lucie-7B, a foundation LLM with strong… See the full description on the dataset page: https://huggingface.co/datasets/OpenLLM-France/Lucie-Training-Dataset.

  3. Llama-Nemotron-Post-Training-Dataset-v1

    • huggingface.co
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). Llama-Nemotron-Post-Training-Dataset-v1 [Dataset]. https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1
    Explore at:
    Dataset updated
    Mar 18, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Llama-Nemotron-Post-Training-Dataset-v1 Release

      Data Overview
    

    This dataset is a compilation of SFT and RL data that supports improvements of math, code, general reasoning, and instruction following capabilities of the original Llama instruct model, in support of NVIDIA’s release of Llama-3.3-Nemotron-Super-49B-v1 and Llama-3.1-Nemotron-Nano-8B-v1. Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) which is a derivative of Meta’s Llama-3.3-70B-Instruct (AKA… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1.

  4. B

    Open Data Training Workshop: Case Studies in Open Data for Qualitative and...

    • borealisdata.ca
    Updated Apr 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srinvivas Murthy; Maggie Woo Kinshella; Jessica Trawin; Teresa Johnson; Niranjan Kissoon; Matthew Wiens; Gina Ogilvie; Gurm Dhugga; J Mark Ansermino (2023). Open Data Training Workshop: Case Studies in Open Data for Qualitative and Quantitative Clinical Research [Dataset]. http://doi.org/10.5683/SP3/BNNAE7
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2023
    Dataset provided by
    Borealis
    Authors
    Srinvivas Murthy; Maggie Woo Kinshella; Jessica Trawin; Teresa Johnson; Niranjan Kissoon; Matthew Wiens; Gina Ogilvie; Gurm Dhugga; J Mark Ansermino
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Dataset funded by
    Digital Research Alliance of Canada
    Description

    Objective(s): Momentum for open access to research is growing. Funding agencies and publishers are increasingly requiring researchers make their data and research outputs open and publicly available. However, clinical researchers struggle to find real-world examples of Open Data sharing. The aim of this 1 hr virtual workshop is to provide real-world examples of Open Data sharing for both qualitative and quantitative data. Specifically, participants will learn: 1. Primary challenges and successes when sharing quantitative and qualitative clinical research data. 2. Platforms available for open data sharing. 3. Ways to troubleshoot data sharing and publish from open data. Workshop Agenda: 1. “Data sharing during the COVID-19 pandemic” - Speaker: Srinivas Murthy, Clinical Associate Professor, Department of Pediatrics, Faculty of Medicine, University of British Columbia. Investigator, BC Children's Hospital 2. “Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project.” - Speaker: Maggie Woo Kinshella, Global Health Research Coordinator, Department of Obstetrics and Gynaecology, BC Children’s and Women’s Hospital and University of British Columbia This workshop draws on work supported by the Digital Research Alliance of Canada. Data Description: Presentation slides, Workshop Video, and Workshop Communication Srinivas Murthy: Data sharing during the COVID-19 pandemic presentation and accompanying PowerPoint slides. Maggie Woo Kinshella: Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project presentation and accompanying Powerpoint slides. This workshop was developed as part of Dr. Ansermino's Data Champions Pilot Project supported by the Digital Research Alliance of Canada. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."

  5. d

    Golf Courses

    • catalog.data.gov
    • data.ny.gov
    • +2more
    Updated Dec 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of New York (2024). Golf Courses [Dataset]. https://catalog.data.gov/dataset/golf-courses-74782
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    State of New York
    Description

    The New York State Office of Parks, Recreation and Historic Preservation (OPRHP) oversees more than 250 state parks, historic sites, recreational trails, golf courses, boat launches and more, encompassing nearly 350,000 acres, that are visited by 74 million people annually. These facilities contribute to the economic vitality and quality of life of local communities and directly support New York’s tourism industry. Parks also provide a place for families and children to be active and exercise, promoting healthy lifestyles. The agency is responsible for the operation and stewardship of the state park system as well as advancing a statewide parks, historic preservation, and open space mission. From the famed Bethpage Black, to the rolling terrain of the Robert Trent Jones' designed 18-hole course at Green Lakes State Park, New York's state park golf courses rank among the best public courses in the world. For more information, visit http://nysparks.com/golf-courses/

  6. Golf Courses

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2025). Golf Courses [Dataset]. https://catalog.data.gov/dataset/golf-courses-6a22b
    Explore at:
    Dataset updated
    Mar 8, 2025
    Dataset provided by
    Description

    Seattle Parks and Recreation Golf Course locations. SPR Golf Courses are managed by contractors.Refresh Cycle: WeeklyFeature Class: DPR.GolfCourse

  7. Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human...

    • data.nist.gov
    • cloud.csiss.gmu.edu
    • +1more
    Updated Oct 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian DeCost (2020). Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models [Dataset]. http://doi.org/10.18434/mds2-2301
    Explore at:
    Dataset updated
    Oct 23, 2020
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Brian DeCost
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations. Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.

  8. d

    Training dataset for NABat Machine Learning V1.0

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Training dataset for NABat Machine Learning V1.0 [Dataset]. https://catalog.data.gov/dataset/training-dataset-for-nabat-machine-learning-v1-0
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.

  9. o

    University SET data, with faculty and courses characteristics

    • openicpsr.org
    Updated Sep 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
    Explore at:
    Dataset updated
    Sep 12, 2021
    Authors
    Under blind review in refereed journal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○

  10. D

    Danish Golf Courses Orthophotos Dataset

    • datasetninja.com
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesper Kjærgaard Mortensen; Vinicius Soares Matthiesen; Jacobo Gonzalez de Frutos (2023). Danish Golf Courses Orthophotos Dataset [Dataset]. https://datasetninja.com/danish-golf-courses-orthophotos
    Explore at:
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Dataset Ninja
    Authors
    Jesper Kjærgaard Mortensen; Vinicius Soares Matthiesen; Jacobo Gonzalez de Frutos
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    Denmark
    Description

    The authors of the Danish Golf Courses Orthophotos dataset present a system designed to assist golf course raters in evaluating the difficulty rating of golf holes. Traditionally, this process involves time-consuming manual measurements on the golf course, which the authors aim to automate partially. They achieve this by training a U-net neural network to classify various elements on golf courses, including green, fairway, tee, bunker, and water bodies. This system helps measure distances between relevant course components. Notably, prior to this work, there were no publicly available datasets containing golf course data. To address this gap, the authors introduce a new public dataset of golf courses created from orthophotos. This dataset comprises 1,123 RGB orthophotos for training and validation, along with 108 RGB orthophotos for testing, all collected from 107 Danish golf courses during the spring season. The dataset includes manual annotations.

  11. d

    Community College Course Listing

    • catalog.data.gov
    • data.oregon.gov
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Community College Course Listing [Dataset]. https://catalog.data.gov/dataset/community-college-course-listing
    Explore at:
    Dataset updated
    Mar 8, 2025
    Dataset provided by
    data.oregon.gov
    Description

    Oregon’s 17 Community Colleges cover 60 campuses and centers throughout the state. This dataset includes the current list of all courses offered by a community college. Course information is subject to change; please contact your local community college for course confirmation.

  12. O

    BUTTER - Empirical Deep Learning Dataset

    • data.openei.org
    • datasets.ai
    • +2more
    code, data, website
    Updated May 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek (2022). BUTTER - Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/1872441
    Explore at:
    code, website, dataAvailable download formats
    Dataset updated
    May 20, 2022
    Dataset provided by
    Open Energy Data Initiative (OEDI)
    National Renewable Energy Laboratory
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    Authors
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

  13. Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata

    • datarade.ai
    .csv
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WIRESTOCK (2023). Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata [Dataset]. https://datarade.ai/data-products/wirestock-s-ai-ml-image-training-data-4-5m-files-with-metadata-wirestock
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    Wirestock
    Authors
    WIRESTOCK
    Area covered
    Chile, Estonia, Peru, Pakistan, Jersey, Swaziland, Sudan, Belarus, New Caledonia, Georgia
    Description

    Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata: This data product is a unique offering in the realm of AI/ML training data. What sets it apart is the sheer volume and diversity of the dataset, which includes 4.5 million files spanning across 20 different categories. These categories range from Animals/Wildlife and The Arts to Technology and Transportation, providing a rich and varied dataset for AI/ML applications.

    The data is sourced from Wirestock's platform, where creators upload and sell their photos, videos, and AI art online. This means that the data is not only vast but also constantly updated, ensuring a fresh and relevant dataset for your AI/ML needs. The data is collected in a GDPR-compliant manner, ensuring the privacy and rights of the creators are respected.

    The primary use-cases for this data product are numerous. It is ideal for training machine learning models for image recognition, improving computer vision algorithms, and enhancing AI applications in various industries such as retail, healthcare, and transportation. The diversity of the dataset also means it can be used for more niche applications, such as training AI to recognize specific objects or scenes.

    This data product fits into Wirestock's broader data offering as a key resource for AI/ML training. Wirestock is a platform for creators to sell their work, and this dataset is a collection of that work. It represents the breadth and depth of content available on Wirestock, making it a valuable resource for any company working with AI/ML.

    The core benefits of this dataset are its volume, diversity, and quality. With 4.5 million files, it provides a vast resource for AI training. The diversity of the dataset, spanning 20 categories, ensures a wide range of images for training purposes. The quality of the images is also high, as they are sourced from creators selling their work on Wirestock.

    In terms of how the data is collected, creators upload their work to Wirestock, where it is then sold on various marketplaces. This means the data is sourced directly from creators, ensuring a diverse and unique dataset. The data includes both the images themselves and associated metadata, providing additional context for each image.

    The different image categories included in this dataset are Animals/Wildlife, The Arts, Backgrounds/Textures, Beauty/Fashion, Buildings/Landmarks, Business/Finance, Celebrities, Education, Emotions, Food Drinks, Holidays, Industrial, Interiors, Nature Parks/Outdoor, People, Religion, Science, Signs/Symbols, Sports/Recreation, Technology, Transportation, Vintage, Healthcare/Medical, Objects, and Miscellaneous. This wide range of categories ensures a diverse dataset that can cater to a variety of AI/ML applications.

  14. H

    TRAINING DATASET: Hands-On Uploading Data (Download This File)

    • opendata.hawaii.gov
    • data.wu.ac.at
    xls
    Updated Sep 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training (2020). TRAINING DATASET: Hands-On Uploading Data (Download This File) [Dataset]. https://opendata.hawaii.gov/dataset/training-dataset-hands-on-uploading-data-download-this-file
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 23, 2020
    Dataset authored and provided by
    Training
    Description

    TRAINING DATASET: Hands-On Uploading Data (Download This File)

  15. C

    Golf Courses

    • phoenixopendata.com
    • mapping-phoenix.opendata.arcgis.com
    • +1more
    Updated Jan 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enterprise (2022). Golf Courses [Dataset]. https://www.phoenixopendata.com/dataset/golf-courses
    Explore at:
    arcgis geoservices rest api, geojson, csv, html, zip, kmlAvailable download formats
    Dataset updated
    Jan 13, 2022
    Dataset provided by
    City of Phoenix
    Authors
    Enterprise
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Phoenix Golf Features:

    • Five 18-hole championship courses

    • Three 9-hole courses

    • Conveniently located throughout the city

    • Overseeded annually for optimum playing conditions

    • Full-service golf shops and restaurants

    • Full amenity practice facilities

    • Equipment rentals

    • PGA/LPGA professional course managers

  16. B

    Open Data Training Video: A proposed data de-identification framework for...

    • borealisdata.ca
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alishah Mawji; Holly Longstaff; Jessica Trawin; Clare Komugisha; Stefanie K. Novakowski; Matt Wiens; Samuel Akech; Abner Tagoola; Niranjan Kissoon; Mark J. Ansermino (2023). Open Data Training Video: A proposed data de-identification framework for clinical research [Dataset]. http://doi.org/10.5683/SP3/7XYZVC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    Borealis
    Authors
    Alishah Mawji; Holly Longstaff; Jessica Trawin; Clare Komugisha; Stefanie K. Novakowski; Matt Wiens; Samuel Akech; Abner Tagoola; Niranjan Kissoon; Mark J. Ansermino
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Objective(s): Data sharing has enormous potential to accelerate and improve the accuracy of research, strengthen collaborations, and restore trust in the clinical research enterprise. Nevertheless, there remains reluctancy to openly share raw data sets, in part due to concerns regarding research participant confidentiality and privacy. We provide an instructional video to describe a standardized de-identification framework that can be adapted and refined based on specific context and risks. Data Description: Training video, presentation slides. Related Resources: The data de-identification algorithm, dataset, and data dictionary that correspond with this training video are available through the Smart Triage sub-Dataverse. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."

  17. h

    dataset-preferences-llm-course-full-dataset

    • huggingface.co
    Updated May 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dataset-preferences-llm-course-full-dataset [Dataset]. https://huggingface.co/datasets/davanstrien/dataset-preferences-llm-course-full-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2024
    Authors
    Daniel van Strien
    Description

    Dataset Card for dataset-preferences-llm-course-full-dataset

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/davanstrien/dataset-preferences-llm-course-full-dataset/raw/main/pipeline.yaml"

    or explore the configuration: distilabel… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/dataset-preferences-llm-course-full-dataset.

  18. d

    Data from: Evaluating Promising Practices in Undergraduate STEM Lecture...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Feb 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Warschauer; Lynn Reimer; Kameryn Denaro; Gabe Orona; Katerina Schenke; Tutrang Nguyen; Amanda Niili; Di Xu; Sabrina Solanki; Tate Tamara (2021). Evaluating Promising Practices in Undergraduate STEM Lecture Courses [Dataset]. http://doi.org/10.7280/D11M5Q
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 21, 2021
    Dataset provided by
    Dryad
    Authors
    Mark Warschauer; Lynn Reimer; Kameryn Denaro; Gabe Orona; Katerina Schenke; Tutrang Nguyen; Amanda Niili; Di Xu; Sabrina Solanki; Tate Tamara
    Time period covered
    2021
    Description

    Papers linked may not use the full dataset.

  19. Data from: Distance Education Courses for Public Elementary and Secondary...

    • catalog.data.gov
    • data.wu.ac.at
    Updated Dec 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Education Statistics (NCES) (2023). Distance Education Courses for Public Elementary and Secondary School Students, 2009-10 [Dataset]. https://catalog.data.gov/dataset/distance-education-courses-for-public-elementary-and-secondary-school-students-2009-10-c10db
    Explore at:
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    National Center for Education Statisticshttps://nces.ed.gov/
    Description

    The Distance Education Courses for Public Elementary and Secondary School Students, 2009-10 (FRSS 98), is a study that is part of the Fast Response Survey System (FRSS) program; program data is available since 1998-99 at . FRSS 98 (https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2012009) is a sample survey that provides national estimates on distance education courses in public school districts, including enrollment in distance education courses, how districts monitor these courses, the motivations for providing distance education, and the technologies used for delivering distance education. The study was conducted using surveys via the web or by mail. District superintendents were sampled. The study's weighted response rate was 95%. Key statistics produced from FRSS 98 include the types of distance education courses taken by students, whether the district plans to expand the number of distance education courses, and the technologies used for delivering distance education.

  20. ChatQA-Training-Data

    • huggingface.co
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Data Description

    We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
State of Maryland (2024). Open Data Portal and Data & Insights Training [Dataset]. https://datasets.ai/datasets/socrata-and-open-data-portal-training

Open Data Portal and Data & Insights Training

Explore at:
Dataset updated
Sep 26, 2024
Dataset authored and provided by
State of Maryland
Description

For newcomers to the opendata.maryland.gov site, gopi.data.socrata.com, and performance.maryland.gov, this page provides some insight and training into navigating these portals and how to effectively use Data & Insights, these sites' data management tool.

Search
Clear search
Close search
Google apps
Main menu