59 datasets found
  1. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  2. d

    Teaching undergraduates with quantitative data in the social sciences at...

    • search.dataone.org
    • dataone.org
    • +3more
    Updated Jun 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Renata Gonçalves Curty; Rebecca Greer; Torin White (2024). Teaching undergraduates with quantitative data in the social sciences at University of California Santa Barbara [Dataset]. https://search.dataone.org/view/sha256%3A62b393a77343a0b237b65b163d9e5ce3a697794d16469015c1f0822dba227e1e
    Explore at:
    Dataset updated
    Jun 14, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Renata Gonçalves Curty; Rebecca Greer; Torin White
    Time period covered
    Apr 15, 2022
    Description

    The interview data was gathered for a project that investigated the practices of instructors who use quantitative data to teach undergraduate courses within the Social Sciences. The study was undertaken by employees of the University of California, Santa Barbara (UCSB) Library, who participated in this research project with 19 other colleges and universities across the U.S. under the direction of Ithaka S+R. Ithaka S+R is a New York-based research organization, which, among other goals, seeks to develop strategies, services, and products to meet evolving academic trends to support faculty and students.

    The field of Social Sciences has been notoriously known for valuing the contextual component of data and increasingly entertaining more quantitative and computational approaches to research in response to the prevalence of data literacy skills needed to navigate both personal and professional contexts. Thus, this study becomes particularly timely to identify current instructors’ practi..., The project followed a qualitative and exploratory approach to understand current practices of faculty teaching with data. The study was IRB approved and was exempt by the UCSB’s Office of Research in July 2020 (Protocol 1-20-0491).Â

    The identification and recruitment of potential participants took into account the selection criteria pre-established by Ithaka S+R: a) instructors of courses within the Social Sciences, considering the field as broadly defined, and making the best judgment in cases the discipline intersects with other fields; b) instructors who teach undergraduate courses or courses where most of the students are at the undergraduate level; c) instructors of any rank, including adjuncts and graduate students; as long as they were listed as instructors of record of the selected courses; d) instructors who teach courses were students engage with quantitative/computational data.Â

    The sampling process followed a combination of strategies to more easily identify instructo..., The data folder contains 10Â pdf files with de-identified transcriptions of the interviews and the pdf files with the recruitment email and the interview guide.Â

  3. Corporate big data initiative success rates U.S. and worldwide 2019

    • statista.com
    Updated May 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Corporate big data initiative success rates U.S. and worldwide 2019 [Dataset]. https://www.statista.com/statistics/742935/worldwide-survey-corporate-big-data-initiatives-and-success-rate/
    Explore at:
    Dataset updated
    May 23, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2019
    Area covered
    Worldwide, United States
    Description

    The statistic shows the success rate of various big data initiatives as of 2019, according to a survey of industry-leading firms, primarily in the United States. As of that time, 59.5 percent of respondents reported having seen measurable results from big data initiatives to decrease expenses.

  4. Inform project training materials

    • cookislands-data.sprep.org
    • americansamoa-data.nocache.eightyoptions.com.au
    • +14more
    docx, pptx
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). Inform project training materials [Dataset]. https://cookislands-data.sprep.org/dataset/inform-project-training-materials
    Explore at:
    pptx, docx(110501)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Pacific Region
    Description

    A collection of Inform project training materials. You are free to download and use any of the training resources below. The PowerPoint presentations contain a complete set of slides, so please feel free to copy, delete or change slides, to fit the purpose of your country training.

  5. D

    Meredith Giuliani - PhD project data for study 5

    • dataverse.nl
    • dataverse.harvard.edu
    Updated Nov 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meredith Giuliani; Meredith Giuliani (2021). Meredith Giuliani - PhD project data for study 5 [Dataset]. http://doi.org/10.34894/1ZCLCV
    Explore at:
    Dataset updated
    Nov 17, 2021
    Dataset provided by
    DataverseNL
    Authors
    Meredith Giuliani; Meredith Giuliani
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Study 5: Down from the Ivory Tower: Exploring Implementation of the ESTRO Core Curriculum at the National Level. An anonymous, 37-item, survey was designed and distributed to the Presidents of the National Societies who have endorsed the ESTRO Core Curriculum (n=29). The survey addressed perceptions about implementation factors related to context, process and curriculum change. The data was summarized using descriptive statistics.

  6. Most important skills for managing complex projects in organizations...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Most important skills for managing complex projects in organizations worldwide 2013 [Dataset]. https://www.statista.com/statistics/293406/most-important-skills-for-managing-complex-projects-in-organizations-worldwide/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2013
    Area covered
    Worldwide
    Description

    This statistic shows the most important skills to successfully manage highly complex projects in organizations worldwide as of July 2013. During the survey, 81 percent of the respondents stated that leadership skills were the most important for successfully managing highly complex projects.

  7. d

    CEQR Project Milestones

    • catalog.data.gov
    • data.cityofnewyork.us
    • +1more
    Updated Mar 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). CEQR Project Milestones [Dataset]. https://catalog.data.gov/dataset/ceqr-project-milestones
    Explore at:
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    CEQR Open Data contains information on projects that are undergoing or have completed review through the City Environmental Quality Review (CEQR) process. Project information available at the Open Data Portal includes the CEQR Number, Project Name, the Project Description, the Lead Agency, project milestones, and geographical locations. CEQR Open Data contains information on CEQR projects, which were filed with the Mayor’s Office from January 1, 2005 to the present. For associated documents, please follow the links to the CEQR Access Database.

  8. T

    Strategic Measures_Number of transportation projects, programs, and...

    • datahub.austintexas.gov
    • data.austintexas.gov
    • +1more
    application/rdfxml +5
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Austin, Texas - data.austintexas.gov (2023). Strategic Measures_Number of transportation projects, programs, and initiatives that are coordinated with partner agencies [Dataset]. https://datahub.austintexas.gov/Transportation-and-Mobility/Strategic-Measures_Number-of-transportation-projec/fi2q-4nnb
    Explore at:
    csv, application/rssxml, xml, json, application/rdfxml, tsvAvailable download formats
    Dataset updated
    May 12, 2023
    Dataset authored and provided by
    City of Austin, Texas - data.austintexas.gov
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset supports measure M.A.10 of SD 2023. The source of the data is the Austin Transportation Department. Each row represents a project in which the City of Austin was a sponsoring agency with a partner involved or if another agency was the lead and the City of Austin was a supporting partner. This dataset can be used to look at the transportation projects, programs and initiatives that the City of Austin is working in coordination with other agencies.

    View more details and insights related to this measure on the story page : https://data.austintexas.gov/stories/s/yejj-ryqx

  9. d

    Part A – Enterprise Zone Business Projects - 2022 Exemptions on Qualified...

    • catalog.data.gov
    • data.oregon.gov
    Updated Sep 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.oregon.gov (2022). Part A – Enterprise Zone Business Projects - 2022 Exemptions on Qualified Property* [Dataset]. https://catalog.data.gov/dataset/part-a-enterprise-zone-business-projects-2022-exemptions-on-qualified-property
    Explore at:
    Dataset updated
    Sep 16, 2022
    Dataset provided by
    data.oregon.gov
    Description

    This report includes data from Enterprise Zone Business Projects - with exemptions on qualified property. This is Part A of a four (4) part report. A data dictionary and additional notes document are attached as resources. For more information, visit Business Oregon https://www.oregon.gov/biz/programs/enterprisezones

  10. CalVTP Approved and Completed Projects App

    • data.ca.gov
    • data.cnra.ca.gov
    • +3more
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAL FIRE (2024). CalVTP Approved and Completed Projects App [Dataset]. https://data.ca.gov/dataset/calvtp-approved-and-completed-projects-app
    Explore at:
    arcgis geoservices rest api, htmlAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    California Department of Forestry and Fire Protectionhttp://calfire.ca.gov/
    Authors
    CAL FIRE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose:
    This viewer is for the general public to see fuels reduction projects approved and completed under the CalVTP.


    Background:
    The California Vegetation Treatment Program (CalVTP), developed by the Board of Forestry and Fire Protection (Board), is a critical component of the state’s multi-faceted strategy to address California’s wildfire crisis. The CalVTP defines the vegetation treatment activities and associated environmental protections to reduce the risk of loss of lives and property, reduce fire suppression costs, restore ecosystems, and protect natural resources as well as other assets at risk from wildfire. The CalVTP supports the use of prescribed burning, mechanical treatments, hand crews, herbicides, and prescribed herbivory as tools to reduce hazardous vegetation around communities in the Wildland-Urban Interface (WUI), to construct fuel breaks, and to restore healthy ecological fire regimes.

    The California Department of Forestry and Fire Protection (CAL FIRE) has the primary responsibility for implementing proposed CalVTP vegetation treatments, though many local, regional, and state agencies could also employ the CalVTP to implement vegetation treatments if their projects are within the scope of the CalVTP (see Final PEIR, Chapter 2, Program Description). The CalVTP will allow CAL FIRE, along with other agency partners, to expand their vegetation treatment activities to treat up to approximately 250,000 acres per year, contributing to the target of 500,000 annual acres of treatment on non-federal lands as expressed in Executive Order (EO) B-52-18.

    The Board has prepared a Final Program Environmental Impact Report (PEIR), which evaluates the environmental impacts of the CalVTP in accordance with the California Environmental Quality Act (CEQA). The Board certified the Final PEIR and approved the CalVTP on December 30, 2019.


    Lifespan:
    This viewer will be available publicly for the lifespan of the CalVTP.

  11. DHSC Government Major Projects Portfolio data, 2018

    • gov.uk
    Updated Jul 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health and Social Care (2018). DHSC Government Major Projects Portfolio data, 2018 [Dataset]. https://www.gov.uk/government/publications/dhsc-government-major-projects-portfolio-data-2018
    Explore at:
    Dataset updated
    Jul 4, 2018
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department of Health and Social Care
    Description

    Each government department has published detailed information about projects on the Government Major Projects Portfolio (GMPP). This includes a Delivery Confidence Assessment rating, financial information (whole life cost, annual budget and forecast spend), project schedule and project narrative.

    The data reflects the status of the GMPP at 30 September 2017 and supports the 2018 Infrastructure and Projects Authority (IPA) Annual Report.

  12. Total number of global open source projects adopted 2024

    • statista.com
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Total number of global open source projects adopted 2024 [Dataset]. https://www.statista.com/statistics/1419477/open-source-projects-adopted/
    Explore at:
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    In 2024, the total number of open source projects taken up was about 3.9 million. Of these, the majority was through JavaScript with about 4.8 million projects, far more than those in any other language.

  13. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  14. Data from: Economics of Resource and Environmental Project Management in the...

    • vanuatu-data.sprep.org
    • niue-data.sprep.org
    • +13more
    pdf
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). Economics of Resource and Environmental Project Management in the Pacific [Dataset]. https://vanuatu-data.sprep.org/dataset/economics-resource-and-environmental-project-management-pacific
    Explore at:
    pdf(7189164)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    -218.3642578125 -7.3624668655357)), -124.1455078125 1.9917933540374, -198.20803642273 -29.024146371439, -133.52053642273 -25.694344510612, POLYGON ((-218.3642578125 -0.82027732487935, Pacific Region
    Description

    This report summarises key economic factors affecting the success of recent resource and environmental management projects in the Pacific.

  15. S

    NYSBIP School Bus Projects: Beginning 2023

    • data.ny.gov
    • gimi9.com
    • +1more
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYS Energy Research and Development Authority (NYSERDA) (2025). NYSBIP School Bus Projects: Beginning 2023 [Dataset]. https://data.ny.gov/Energy-Environment/NYSBIP-School-Bus-Projects-Beginning-2023/7nqt-yksu
    Explore at:
    csv, kml, application/geo+json, application/rssxml, application/rdfxml, tsv, xml, kmzAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset authored and provided by
    NYS Energy Research and Development Authority (NYSERDA)
    Description

    In the April 2022 budget passed by the New York State Legislature and signed by Governor Hochul, the State established a deadline for the transition to zero-emission buses. Specifically, all school buses in the State must be zero-emission buses by 2035. In 2022, voters across NYS overwhelming voted for the Clean Air, Clean Water and Green Jobs Environmental Bond Act (Bond Act) which includes $500M to support the transition to zero-emission school buses. NYSERDA has established the NY School Bus Incentive Program (NYSBIP) to achieve these State public purposes and assist school districts in meeting the zero-emission bus timelines. NYSBIP is a voucher incentive program which will accelerate the deployment of zero-emission school buses and charging infrastructures throughout New York State. Zero-emission school buses include both electric school buses and hydrogen fuel cell school buses (collectively referred to as ESBs). This dataset focuses on the school bus-side of the program. The dataset is compiled from the information collected throughout the project application process. The New York State Energy Research and Development Authority (NYSERDA) offers objective information and analysis, innovative programs, technical expertise, and support to help New Yorkers increase energy efficiency, save money, use renewable energy, and reduce reliance on fossil fuels. To learn more about NYSERDA’s programs, visit nyserda.ny.gov or follow us on X, Facebook, YouTube, or Instagram.

  16. d

    M.A.10_Number of transportation projects, programs, and initiatives that are...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2024). M.A.10_Number of transportation projects, programs, and initiatives that are coordinated with partner agencies [Dataset]. https://catalog.data.gov/dataset/m-a-10-number-of-transportation-projects-programs-and-initiatives-that-are-coordinated-wit
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    data.austintexas.gov
    Description

    Landing page for Number of transportation projects, programs, and initiatives that are coordinated with partner agencies (M.A.10)

  17. T

    FAST-41 Projects Data

    • data.permits.performance.gov
    Updated Mar 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Permitting Data Portal (2025). FAST-41 Projects Data [Dataset]. https://data.permits.performance.gov/widgets/fh3k-bqsc?mobile_redirect=true
    Explore at:
    application/geo+json, xml, tsv, csv, kmz, application/rdfxml, kml, application/rssxmlAvailable download formats
    Dataset updated
    Mar 22, 2025
    Dataset authored and provided by
    Permitting Data Portal
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset contains all milestone information entered by Federal agencies into the Permitting Dashboard. Rows represent single milestones within an individual environmental review or authorization (action). For a full description of all fields in the dataset, see the Data Dictionary. Questions specific to the dataset can be directed to the email listed below. For questions on specific projects, please use the contact information listed on the project page on the The Permitting Dashboard.

  18. d

    Energy Efficiency Completed Projects: Beginning 1987

    • catalog.data.gov
    • datadiscoverystudio.org
    • +4more
    Updated Mar 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of New York (2025). Energy Efficiency Completed Projects: Beginning 1987 [Dataset]. https://catalog.data.gov/dataset/energy-efficiency-completed-projects-beginning-1987-72c81
    Explore at:
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    State of New York
    Description

    The Energy Efficiency programs of the New York Power Authority provide energy-efficiency improvements, with no up-front costs, to public schools and other government facilities. From start to finish, the Power Authority works with facility managers to identify, design and install new lighting and motors, as well as upgrades to heating, ventilation and air-conditioning systems. We try to address all energy efficiency improvements in a single, comprehensive effort. This data set contains information on energy efficiency projects completed since 1987. The data set is updated in a quarterly basis to reflect new data as projects are implemented. The information includes project location, customer name, project name, total cost, and energy efficiency benefits, including energy reduction (electric, natural gas, oil) and greenhouse gas emissions reductions.

  19. Global revenue of the IT project & portfolio management market (IT PPM)...

    • statista.com
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Global revenue of the IT project & portfolio management market (IT PPM) 2014-2024 [Dataset]. https://www.statista.com/statistics/397794/it-ppm-market-revenue-worldwide/
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The statistic shows the global market size of the IT project and portfolio management (IT PPM) market from 2014 to 2019 and a forecast for 2024. In 2019, The total market size of the global IT project and portfolio management (IT PPM) was at 3.88 billion U.S. dollars.

  20. d

    Data from: Insect Species Occurrence Data from Multiple Projects Worldwide...

    • catalog.data.gov
    • data.usgs.gov
    • +7more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Insect Species Occurrence Data from Multiple Projects Worldwide with Focus on Bees and Wasps in North America [Dataset]. https://catalog.data.gov/dataset/insect-species-occurrence-data-from-multiple-projects-worldwide-with-focus-on-bees-and-was-b3123
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Species occurrence records for native and non-native bees, wasps and other insects collected using mainly pan, malaise, and vane trapping; and insect netting methods in Canada, Mexico, the non-contiguous United States, U.S. Territories (specifically U.S. Virgin Islands), U.S. Minor Outlying Islands and other global locations with the bulk of the specimens coming from the Eastern United States often from Federal lands such as USFWS, NPS, DOD, USFS. Some records also contain notes regarding plants or substrates from which insects were collected or that were present and/or in flower at the time the insects were collected. Unless otherwise noted, taxonomic determinations (identifications) were completed by Sam Droege (USGS Eastern Ecological Science Center- EESC, Native Bee Laboratory) and Clare Maffei (USFWS, Inventory and Monitoring Branch). The EESC Native Bee Lab currently keeps only a small synoptic collection, rare and voucher specimens are deposited in the Smithsonian National Collection (NMNH) and widely distributed to other institutions for DNA, revisions, and augmentation of existing collections. Surplus specimens are also made available to students to learn their identifications. Corrections to any of our determinations are always welcomed. Common species that are not in demand for surplus are usually destroyed and the pins recycled. Recent revisions to Lasioglossum, Ceratina, and to a much lesser extent Triepeolus and Epeolus and other small groups have rendered determinations prior to those revisions out of date for species involved in name changes and users should account for that during analyses. Current data (included information on specimen codes without identifications) are always available without charge directly from Sam Droege.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
Organization logo

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

Related Article
Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE

Search
Clear search
Close search
Google apps
Main menu