59 datasets found

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21967265.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE
d
Teaching undergraduates with quantitative data in the social sciences at...
search.dataone.org
dataone.org
+3more
Updated Jun 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renata GonÃ§alves Curty; Rebecca Greer; Torin White (2024). Teaching undergraduates with quantitative data in the social sciences at University of California Santa Barbara [Dataset]. https://search.dataone.org/view/sha256%3A62b393a77343a0b237b65b163d9e5ce3a697794d16469015c1f0822dba227e1e
Explore at:
Dataset updated
Jun 14, 2024
Dataset provided by
Dryad Digital Repository
Authors
Renata GonÃ§alves Curty; Rebecca Greer; Torin White
Time period covered
Apr 15, 2022
Description
The interview data was gathered for a project that investigatedÂ the practices of instructors who use quantitative data to teach undergraduate courses within the Social Sciences. The study was undertaken by employees of the University of California, Santa Barbara (UCSB) Library, who participated in this research project with 19 other colleges and universities across the U.S. under the direction of Ithaka S+R. Ithaka S+R is a New York-based research organization, which, among other goals, seeks to develop strategies, services, and products to meet evolving academic trends to support faculty and students.

The field of Social Sciences has been notoriously known for valuing the contextual component of data and increasingly entertaining more quantitative and computational approaches to research in response to the prevalence of data literacy skills needed to navigate both personal and professional contexts. Thus, this study becomes particularly timely to identify current instructorsâ€™ practi..., The project followed a qualitative and exploratory approach to understand current practices of faculty teaching with data. The study was IRB approved and was exempt by the UCSBâ€™s Office of Research in July 2020 (Protocol 1-20-0491).Â

The identification and recruitment of potential participants took into account the selection criteria pre-established by Ithaka S+R: a) instructors of courses within the Social Sciences, considering the field as broadly defined, and making the best judgment in cases the discipline intersects with other fields; b) instructors who teach undergraduate courses or courses where most of the students are at the undergraduate level; c) instructors of any rank, including adjuncts and graduate students; as long as they were listed as instructors of record of the selected courses; d) instructors who teach courses were students engage with quantitative/computational data.Â

The sampling process followed a combination of strategies to more easily identify instructo..., The data folder contains 10Â pdf files with de-identified transcriptions of the interviews and the pdf files with the recruitment email and the interview guide.Â
Corporate big data initiative success rates U.S. and worldwide 2019
statista.com
Updated May 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Corporate big data initiative success rates U.S. and worldwide 2019 [Dataset]. https://www.statista.com/statistics/742935/worldwide-survey-corporate-big-data-initiatives-and-success-rate/
Explore at:
Dataset updated
May 23, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2019
Area covered
Worldwide, United States
Description
The statistic shows the success rate of various big data initiatives as of 2019, according to a survey of industry-leading firms, primarily in the United States. As of that time, 59.5 percent of respondents reported having seen measurable results from big data initiatives to decrease expenses.
Inform project training materials
cookislands-data.sprep.org
americansamoa-data.nocache.eightyoptions.com.au
+14more
docx, pptx
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Secretariat of the Pacific Regional Environment Programme (2025). Inform project training materials [Dataset]. https://cookislands-data.sprep.org/dataset/inform-project-training-materials
Explore at:
pptx, docx(110501)Available download formats
Dataset updated
Feb 20, 2025
Dataset provided by
Pacific Regional Environment Programmehttps://www.sprep.org/
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
Pacific Region
Description
A collection of Inform project training materials. You are free to download and use any of the training resources below. The PowerPoint presentations contain a complete set of slides, so please feel free to copy, delete or change slides, to fit the purpose of your country training.
D
Meredith Giuliani - PhD project data for study 5
dataverse.nl
dataverse.harvard.edu
Updated Nov 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meredith Giuliani; Meredith Giuliani (2021). Meredith Giuliani - PhD project data for study 5 [Dataset]. http://doi.org/10.34894/1ZCLCV
Explore at:
Unique identifier
https://doi.org/10.34894/1ZCLCV
Dataset updated
Nov 17, 2021
Dataset provided by
DataverseNL
Authors
Meredith Giuliani; Meredith Giuliani
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Study 5: Down from the Ivory Tower: Exploring Implementation of the ESTRO Core Curriculum at the National Level. An anonymous, 37-item, survey was designed and distributed to the Presidents of the National Societies who have endorsed the ESTRO Core Curriculum (n=29). The survey addressed perceptions about implementation factors related to context, process and curriculum change. The data was summarized using descriptive statistics.
Most important skills for managing complex projects in organizations...
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Most important skills for managing complex projects in organizations worldwide 2013 [Dataset]. https://www.statista.com/statistics/293406/most-important-skills-for-managing-complex-projects-in-organizations-worldwide/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2013
Area covered
Worldwide
Description
This statistic shows the most important skills to successfully manage highly complex projects in organizations worldwide as of July 2013. During the survey, 81 percent of the respondents stated that leadership skills were the most important for successfully managing highly complex projects.
d
CEQR Project Milestones
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Mar 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). CEQR Project Milestones [Dataset]. https://catalog.data.gov/dataset/ceqr-project-milestones
Explore at:
Dataset updated
Mar 22, 2025
Dataset provided by
data.cityofnewyork.us
Description
CEQR Open Data contains information on projects that are undergoing or have completed review through the City Environmental Quality Review (CEQR) process. Project information available at the Open Data Portal includes the CEQR Number, Project Name, the Project Description, the Lead Agency, project milestones, and geographical locations. CEQR Open Data contains information on CEQR projects, which were filed with the Mayor’s Office from January 1, 2005 to the present. For associated documents, please follow the links to the CEQR Access Database.
T
Strategic Measures_Number of transportation projects, programs, and...
datahub.austintexas.gov
data.austintexas.gov
+1more
application/rdfxml +5
Updated May 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Austin, Texas - data.austintexas.gov (2023). Strategic Measures_Number of transportation projects, programs, and initiatives that are coordinated with partner agencies [Dataset]. https://datahub.austintexas.gov/Transportation-and-Mobility/Strategic-Measures_Number-of-transportation-projec/fi2q-4nnb
Explore at:
csv, application/rssxml, xml, json, application/rdfxml, tsvAvailable download formats
Dataset updated
May 12, 2023
Dataset authored and provided by
City of Austin, Texas - data.austintexas.gov
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This dataset supports measure M.A.10 of SD 2023. The source of the data is the Austin Transportation Department. Each row represents a project in which the City of Austin was a sponsoring agency with a partner involved or if another agency was the lead and the City of Austin was a supporting partner. This dataset can be used to look at the transportation projects, programs and initiatives that the City of Austin is working in coordination with other agencies.

View more details and insights related to this measure on the story page : https://data.austintexas.gov/stories/s/yejj-ryqx
d
Part A – Enterprise Zone Business Projects - 2022 Exemptions on Qualified...
catalog.data.gov
data.oregon.gov
Updated Sep 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.oregon.gov (2022). Part A – Enterprise Zone Business Projects - 2022 Exemptions on Qualified Property* [Dataset]. https://catalog.data.gov/dataset/part-a-enterprise-zone-business-projects-2022-exemptions-on-qualified-property
Explore at:
Dataset updated
Sep 16, 2022
Dataset provided by
data.oregon.gov
Description
This report includes data from Enterprise Zone Business Projects - with exemptions on qualified property. This is Part A of a four (4) part report. A data dictionary and additional notes document are attached as resources. For more information, visit Business Oregon https://www.oregon.gov/biz/programs/enterprisezones
CalVTP Approved and Completed Projects App
data.ca.gov
data.cnra.ca.gov
+3more
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CAL FIRE (2024). CalVTP Approved and Completed Projects App [Dataset]. https://data.ca.gov/dataset/calvtp-approved-and-completed-projects-app
Explore at:
arcgis geoservices rest api, htmlAvailable download formats
Dataset updated
Aug 15, 2024
Dataset provided by
California Department of Forestry and Fire Protectionhttp://calfire.ca.gov/
Authors
CAL FIRE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose:
This viewer is for the general public to see fuels reduction projects approved and completed under the CalVTP.

Background:
The California Vegetation Treatment Program (CalVTP), developed by the Board of Forestry and Fire Protection (Board), is a critical component of the state’s multi-faceted strategy to address California’s wildfire crisis. The CalVTP defines the vegetation treatment activities and associated environmental protections to reduce the risk of loss of lives and property, reduce fire suppression costs, restore ecosystems, and protect natural resources as well as other assets at risk from wildfire. The CalVTP supports the use of prescribed burning, mechanical treatments, hand crews, herbicides, and prescribed herbivory as tools to reduce hazardous vegetation around communities in the Wildland-Urban Interface (WUI), to construct fuel breaks, and to restore healthy ecological fire regimes.
The California Department of Forestry and Fire Protection (CAL FIRE) has the primary responsibility for implementing proposed CalVTP vegetation treatments, though many local, regional, and state agencies could also employ the CalVTP to implement vegetation treatments if their projects are within the scope of the CalVTP (see Final PEIR, Chapter 2, Program Description). The CalVTP will allow CAL FIRE, along with other agency partners, to expand their vegetation treatment activities to treat up to approximately 250,000 acres per year, contributing to the target of 500,000 annual acres of treatment on non-federal lands as expressed in Executive Order (EO) B-52-18.

The Board has prepared a Final Program Environmental Impact Report (PEIR), which evaluates the environmental impacts of the CalVTP in accordance with the California Environmental Quality Act (CEQA). The Board certified the Final PEIR and approved the CalVTP on December 30, 2019.
For more information, visit this link: https://bof.fire.ca.gov/projects-and-programs/calvtp-homepage/

Lifespan:
This viewer will be available publicly for the lifespan of the CalVTP.
DHSC Government Major Projects Portfolio data, 2018
gov.uk
Updated Jul 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health and Social Care (2018). DHSC Government Major Projects Portfolio data, 2018 [Dataset]. https://www.gov.uk/government/publications/dhsc-government-major-projects-portfolio-data-2018
Explore at:
Dataset updated
Jul 4, 2018
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department of Health and Social Care
Description
Each government department has published detailed information about projects on the Government Major Projects Portfolio (GMPP). This includes a Delivery Confidence Assessment rating, financial information (whole life cost, annual budget and forecast spend), project schedule and project narrative.

The data reflects the status of the GMPP at 30 September 2017 and supports the 2018 Infrastructure and Projects Authority (IPA) Annual Report.
Total number of global open source projects adopted 2024
statista.com
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Total number of global open source projects adopted 2024 [Dataset]. https://www.statista.com/statistics/1419477/open-source-projects-adopted/
Explore at:
Dataset updated
Oct 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
In 2024, the total number of open source projects taken up was about 3.9 million. Of these, the majority was through JavaScript with about 4.8 million projects, far more than those in any other language.
World Bank: Education Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?
Data from: Economics of Resource and Environmental Project Management in the...
vanuatu-data.sprep.org
niue-data.sprep.org
+13more
pdf
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Secretariat of the Pacific Regional Environment Programme (2025). Economics of Resource and Environmental Project Management in the Pacific [Dataset]. https://vanuatu-data.sprep.org/dataset/economics-resource-and-environmental-project-management-pacific
Explore at:
pdf(7189164)Available download formats
Dataset updated
Feb 20, 2025
Dataset provided by
Pacific Regional Environment Programmehttps://www.sprep.org/
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
-218.3642578125 -7.3624668655357)), -124.1455078125 1.9917933540374, -198.20803642273 -29.024146371439, -133.52053642273 -25.694344510612, POLYGON ((-218.3642578125 -0.82027732487935, Pacific Region
Description
This report summarises key economic factors affecting the success of recent resource and environmental management projects in the Pacific.
S
NYSBIP School Bus Projects: Beginning 2023
data.ny.gov
gimi9.com
+1more
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYS Energy Research and Development Authority (NYSERDA) (2025). NYSBIP School Bus Projects: Beginning 2023 [Dataset]. https://data.ny.gov/Energy-Environment/NYSBIP-School-Bus-Projects-Beginning-2023/7nqt-yksu
Explore at:
csv, kml, application/geo+json, application/rssxml, application/rdfxml, tsv, xml, kmzAvailable download formats
Dataset updated
Mar 25, 2025
Dataset authored and provided by
NYS Energy Research and Development Authority (NYSERDA)
Description
In the April 2022 budget passed by the New York State Legislature and signed by Governor Hochul, the State established a deadline for the transition to zero-emission buses. Specifically, all school buses in the State must be zero-emission buses by 2035. In 2022, voters across NYS overwhelming voted for the Clean Air, Clean Water and Green Jobs Environmental Bond Act (Bond Act) which includes $500M to support the transition to zero-emission school buses. NYSERDA has established the NY School Bus Incentive Program (NYSBIP) to achieve these State public purposes and assist school districts in meeting the zero-emission bus timelines. NYSBIP is a voucher incentive program which will accelerate the deployment of zero-emission school buses and charging infrastructures throughout New York State. Zero-emission school buses include both electric school buses and hydrogen fuel cell school buses (collectively referred to as ESBs). This dataset focuses on the school bus-side of the program. The dataset is compiled from the information collected throughout the project application process. The New York State Energy Research and Development Authority (NYSERDA) offers objective information and analysis, innovative programs, technical expertise, and support to help New Yorkers increase energy efficiency, save money, use renewable energy, and reduce reliance on fossil fuels. To learn more about NYSERDA’s programs, visit nyserda.ny.gov or follow us on X, Facebook, YouTube, or Instagram.
d
M.A.10_Number of transportation projects, programs, and initiatives that are...
catalog.data.gov
s.cnmilf.com
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2024). M.A.10_Number of transportation projects, programs, and initiatives that are coordinated with partner agencies [Dataset]. https://catalog.data.gov/dataset/m-a-10-number-of-transportation-projects-programs-and-initiatives-that-are-coordinated-wit
Explore at:
Dataset updated
Sep 25, 2024
Dataset provided by
data.austintexas.gov
Description
Landing page for Number of transportation projects, programs, and initiatives that are coordinated with partner agencies (M.A.10)
T
FAST-41 Projects Data
data.permits.performance.gov
Updated Mar 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Permitting Data Portal (2025). FAST-41 Projects Data [Dataset]. https://data.permits.performance.gov/widgets/fh3k-bqsc?mobile_redirect=true
Explore at:
application/geo+json, xml, tsv, csv, kmz, application/rdfxml, kml, application/rssxmlAvailable download formats
Dataset updated
Mar 22, 2025
Dataset authored and provided by
Permitting Data Portal
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This dataset contains all milestone information entered by Federal agencies into the Permitting Dashboard. Rows represent single milestones within an individual environmental review or authorization (action). For a full description of all fields in the dataset, see the Data Dictionary. Questions specific to the dataset can be directed to the email listed below. For questions on specific projects, please use the contact information listed on the project page on the The Permitting Dashboard.
d
Energy Efficiency Completed Projects: Beginning 1987
catalog.data.gov
datadiscoverystudio.org
+4more
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of New York (2025). Energy Efficiency Completed Projects: Beginning 1987 [Dataset]. https://catalog.data.gov/dataset/energy-efficiency-completed-projects-beginning-1987-72c81
Explore at:
Dataset updated
Mar 22, 2025
Dataset provided by
State of New York
Description
The Energy Efficiency programs of the New York Power Authority provide energy-efficiency improvements, with no up-front costs, to public schools and other government facilities. From start to finish, the Power Authority works with facility managers to identify, design and install new lighting and motors, as well as upgrades to heating, ventilation and air-conditioning systems. We try to address all energy efficiency improvements in a single, comprehensive effort. This data set contains information on energy efficiency projects completed since 1987. The data set is updated in a quarterly basis to reflect new data as projects are implemented. The information includes project location, customer name, project name, total cost, and energy efficiency benefits, including energy reduction (electric, natural gas, oil) and greenhouse gas emissions reductions.
Global revenue of the IT project & portfolio management market (IT PPM)...
statista.com
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Global revenue of the IT project & portfolio management market (IT PPM) 2014-2024 [Dataset]. https://www.statista.com/statistics/397794/it-ppm-market-revenue-worldwide/
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The statistic shows the global market size of the IT project and portfolio management (IT PPM) market from 2014 to 2019 and a forecast for 2024. In 2019, The total market size of the global IT project and portfolio management (IT PPM) was at 3.88 billion U.S. dollars.
d
Data from: Insect Species Occurrence Data from Multiple Projects Worldwide...
catalog.data.gov
data.usgs.gov
+7more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Insect Species Occurrence Data from Multiple Projects Worldwide with Focus on Bees and Wasps in North America [Dataset]. https://catalog.data.gov/dataset/insect-species-occurrence-data-from-multiple-projects-worldwide-with-focus-on-bees-and-was-b3123
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Species occurrence records for native and non-native bees, wasps and other insects collected using mainly pan, malaise, and vane trapping; and insect netting methods in Canada, Mexico, the non-contiguous United States, U.S. Territories (specifically U.S. Virgin Islands), U.S. Minor Outlying Islands and other global locations with the bulk of the specimens coming from the Eastern United States often from Federal lands such as USFWS, NPS, DOD, USFS. Some records also contain notes regarding plants or substrates from which insects were collected or that were present and/or in flower at the time the insects were collected. Unless otherwise noted, taxonomic determinations (identifications) were completed by Sam Droege (USGS Eastern Ecological Science Center- EESC, Native Bee Laboratory) and Clare Maffei (USFWS, Inventory and Monitoring Branch). The EESC Native Bee Lab currently keeps only a small synoptic collection, rare and voucher specimens are deposited in the Smithsonian National Collection (NMNH) and widely distributed to other institutions for DNA, revisions, and augmentation of existing collections. Surplus specimens are also made available to students to learn their identifications. Corrections to any of our determinations are always welcomed. Common species that are not in demand for surplus are usually destroyed and the pins recycled. Recent revisions to Lasioglossum, Ceratina, and to a much lesser extent Triepeolus and Epeolus and other small groups have rendered determinations prior to those revisions out of date for species involved in name changes and users should account for that during analyses. Current data (included information on specimen codes without identifications) are always available without charge directly from Sam Droege.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.21967265.v1

Dataset updated

May 30, 2023

Dataset provided by

figshare

Authors

Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE

Clear search

Close search

Google apps

Main menu

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

Teaching undergraduates with quantitative data in the social sciences at...

Corporate big data initiative success rates U.S. and worldwide 2019

Inform project training materials

Meredith Giuliani - PhD project data for study 5

Most important skills for managing complex projects in organizations...

CEQR Project Milestones

Strategic Measures_Number of transportation projects, programs, and...

Part A – Enterprise Zone Business Projects - 2022 Exemptions on Qualified...

CalVTP Approved and Completed Projects App

DHSC Government Major Projects Portfolio data, 2018

Total number of global open source projects adopted 2024

World Bank: Education Data

Context

Content

Acknowledgements

Inspiration

Data from: Economics of Resource and Environmental Project Management in the...

NYSBIP School Bus Projects: Beginning 2023

M.A.10_Number of transportation projects, programs, and initiatives that are...

FAST-41 Projects Data

Energy Efficiency Completed Projects: Beginning 1987

Global revenue of the IT project & portfolio management market (IT PPM)...

Data from: Insect Species Occurrence Data from Multiple Projects Worldwide...

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python