100+ datasets found
  1. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  2. SEPAL

    • data.amerigeoss.org
    png, wms
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Food and Agriculture Organization (2023). SEPAL [Dataset]. https://data.amerigeoss.org/dataset/sepal
    Explore at:
    png(884051), png(409262), wmsAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Food and Agriculture Organizationhttp://fao.org/
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    What is SEPAL?

    SEPAL (https://sepal.io/) is a free and open source cloud computing platform for geo-spatial data access and processing. It empowers users to quickly process large amounts of data on their computer or mobile device. Users can create custom analysis ready data using freely available satellite imagery, generate and improve land use maps, analyze time series, run change detection and perform accuracy assessment and area estimation, among many other functionalities in the platform. Data can be created and analyzed for any place on Earth using SEPAL.

    https://data.apps.fao.org/catalog/dataset/9c4d7c45-7620-44c4-b653-fbe13eb34b65/resource/63a3efa0-08ab-4ad6-9d4a-96af7b6a99ec/download/cambodia_mosaic_2020.png" alt="alt text" title="Figure 1: Best pixel mosaic of Landsat 8 data for 2020 over Cambodia">

    Figure 1: Best pixel mosaic of Landsat 8 data for 2020 over Cambodia

    SEPAL reaches over 5000 users in 180 countries for the creation of custom data products from freely available satellite data. SEPAL was developed as a part of the Open Foris suite, a set of free and open source software platforms and tools that facilitate flexible and efficient data collection, analysis and reporting. SEPAL combines and integrates modern geospatial data infrastructures and supercomputing power available through Google Earth Engine and Amazon Web Services with powerful open-source data processing software, such as R, ORFEO, GDAL, Python and Jupiter Notebooks. Users can easily access the archive of satellite imagery from NASA, the European Space Agency (ESA) as well as high spatial and temporal resolution data from Planet Labs and turn such images into data that can be used for reporting and better decision making.

    National Forest Monitoring Systems in many countries have been strengthened by SEPAL, which provides technical government staff with computing resources and cutting edge technology to accurately map and monitor their forests. The platform was originally developed for monitoring forest carbon stock and stock changes for reducing emissions from deforestation and forest degradation (REDD+). The application of the tools on the platform now reach far beyond forest monitoring by providing different stakeholders access to cloud based image processing tools, remote sensing and machine learning for any application. Presently, users work on SEPAL for various applications related to land monitoring, land cover/use, land productivity, ecological zoning, ecosystem restoration monitoring, forest monitoring, near real time alerts for forest disturbances and fire, flood mapping, mapping impact of disasters, peatland rewetting status, and many others.

    The Hand-in-Hand initiative enables countries that generate data through SEPAL to disseminate their data widely through the platform and to combine their data with the numerous other datasets available through Hand-in-Hand.

    https://data.apps.fao.org/catalog/dataset/9c4d7c45-7620-44c4-b653-fbe13eb34b65/resource/868e59da-47b9-4736-93a9-f8d83f5731aa/download/probability_classification_over_zambia.png" alt="alt text" title="Figure 2: Image classification module for land monitoring and mapping. Probability classification over Zambia">

    Figure 2: Image classification module for land monitoring and mapping. Probability classification over Zambia
  3. w

    Dataset of news links about Best books

    • workwithdata.com
    Updated May 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of news links about Best books [Dataset]. https://www.workwithdata.com/datasets/news?col=news_link%2Crss&f=1&fcol0=page_name&fop0=%3D&fval0=Best+books
    Explore at:
    Dataset updated
    May 16, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about news. It has 1 row and is filtered where the keywords includes Best books. It features 2 columns including news link.

  4. Top 10 Sources of Tax Revenue (Other than Sales and Income taxes)

    • data.ok.gov
    • datadiscoverystudio.org
    • +2more
    csv
    Updated Oct 31, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Management and Enterprise Services (2019). Top 10 Sources of Tax Revenue (Other than Sales and Income taxes) [Dataset]. https://data.ok.gov/dataset/top-10-sources-of-tax-revenue-other-than-sales-and-income-taxes
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 31, 2019
    Dataset provided by
    Oklahoma Office of Management and Enterprise Serviceshttp://www.omes.ok.gov/
    Authors
    Office of Management and Enterprise Services
    Description

    Breakdown of the Top 10 sources of tax revenue in the State of Oklahoma by broad category, other than sales and income taxes.

  5. Global Open-Source Database Software Market Size By Product, By Application,...

    • verifiedmarketresearch.com
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Open-Source Database Software Market Size By Product, By Application, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/open-source-database-software-market/
    Explore at:
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Open-Source Database Software Market size was valued at USD 10.00 Billion in 2024 and is projected to reach USD 35.83 Billion by 2032, growing at a CAGR of 20% during the forecast period 2026-2032.

    Global Open-Source Database Software Market Drivers

    The market drivers for the Open-Source Database Software Market can be influenced by various factors. These may include:

    Cost-Effectiveness: Compared to proprietary systems, open-source databases frequently have lower initial expenses, which attracts organizations—especially startups and small to medium-sized enterprises (SMEs) with tight budgets. Flexibility and Customisation: Open-source databases provide more possibilities for customization and flexibility, enabling businesses to modify the database to suit their unique needs and grow as necessary. Collaboration and Community Support: Active developer communities that share best practices, support, and contribute to the continued development of open-source databases are beneficial. This cooperative setting can promote quicker problem solving and innovation. Performance and Scalability: A lot of open-source databases are made to scale horizontally across several nodes, which helps businesses manage expanding data volumes and keep up performance levels as their requirements change. Data Security and Sovereignty: Open-source databases provide businesses more control over their data and allow them to decide where to store and use it, which helps to allay worries about compliance and data sovereignty. Furthermore, open-source code openness can improve security by making it simpler to find and fix problems. Compatibility with Contemporary Technologies: Open-source databases are well-suited for contemporary application development and deployment techniques like microservices, containers, and cloud-native architectures since they frequently support a broad range of programming languages, frameworks, and platforms. Growing Cloud Computing Adoption: Open-source databases offer a flexible and affordable solution for managing data in cloud environments, whether through self-managed deployments or via managed database services provided by cloud providers. This is because more and more organizations are moving their workloads to the cloud. Escalating Need for Real-Time Insights and Analytics: Organizations are increasingly adopting open-source databases with integrated analytics capabilities, like NoSQL and NewSQL databases, as a means of instantly obtaining actionable insights from their data.

  6. e

    Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO

    • energydata.info
    Updated Jul 25, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/global-roads-open-access-data-set-2010
    Explore at:
    Dataset updated
    Jul 25, 2018
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set combines the best available roads data by country into a global roads coverage, using the UN Spatial Data Infrastructure Transport (UNSDI-T) version 2 as a common data model. All country road networks have been joined topologically at the borders, and many countries have been edited for internal topology. Source data for each country are provided in the documentation, and users are encouraged to refer to the readme file for use constraints that apply to a small number of countries. Because the data are compiled from multiple sources, the date range for road network representations ranges from the 1980s to 2010 depending on the country (most countries have no confirmed date), and spatial accuracy varies. The baseline global data set was compiled by the Information Technology Outreach Services (ITOS) of the University of Georgia. Updated data for 27 countries and 6 smaller geographic entities were assembled by Columbia University's Center for International Earth Science Information Network (CIESIN), with a focus largely on developing countries with the poorest data coverage.

  7. l

    Louisville Metro KY - Annual Open Data Report 2021

    • data.lojic.org
    • datasets.ai
    • +4more
    Updated Jun 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2022). Louisville Metro KY - Annual Open Data Report 2021 [Dataset]. https://data.lojic.org/documents/01bd70e4ee9b4b3abf4ba0cae940ff40
    Explore at:
    Dataset updated
    Jun 6, 2022
    Dataset authored and provided by
    Louisville/Jefferson County Information Consortium
    License

    https://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-licensehttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-license

    Area covered
    Louisville
    Description

    On October 15, 2013, Louisville Mayor Greg Fischer announced the signing of an open data policy executive order in conjunction with his compelling talk at the 2013 Code for America Summit. In nonchalant cadence, the mayor announced his support for complete information disclosure by declaring, "It's data, man."Sunlight Foundation - New Louisville Open Data Policy Insists Open By Default is the Future Open Data Annual ReportsSection 5.A. Within one year of the effective Data of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously (2014-16) it was led by the Director of IT.Full Executive OrderEXECUTIVE ORDER NO. 1, SERIES 2013AN EXECUTIVE ORDERCREATING AN OPEN DATA PLAN. WHEREAS, Metro Government is the catalyst for creating a world-class city that provides its citizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovation, and a high quality of life; andWHEREAS, it should be easy to do business with Metro Government. Online government interactions mean more convenient services for citizens and businesses and online government interactions improve the cost effectiveness and accuracy of government operations; andWHEREAS, an open government also makes certain that every aspect of the built environment also has reliable digital descriptions available to citizens and entrepreneurs for deep engagement mediated by smart devices; andWHEREAS, every citizen has the right to prompt, efficient service from Metro Government; andWHEREAS, the adoption of open standards improves transparency, access to public information and improved coordination and efficiencies among Departments and partner organizations across the public, nonprofit and private sectors; andWHEREAS, by publishing structured standardized data in machine readable formats the Louisville Metro Government seeks to encourage the local software community to develop software applications and tools to collect, organize, and share public record data in new and innovative ways; andWHEREAS, in commitment to the spirit of Open Government, Louisville Metro Government will consider public information to be open by default and will proactively publish data and data containing information, consistent with the Kentucky Open Meetings and Open Records Act; andNOW, THEREFORE, BE IT PROMULGATED BY EXECUTIVE ORDER OF THE HONORABLE GREG FISCHER, MAYOR OF LOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:Section 1. Definitions. As used in this Executive Order, the terms below shall have the following definitions:(A) “Open Data” means any public record as defined by the Kentucky Open Records Act, which could be made available online using Open Format data, as well as best practice Open Data structures and formats when possible. Open Data is not information that is treated exempt under KRS 61.878 by Metro Government.(B) “Open Data Report” is the annual report of the Open Data Management Team, which shall (i) summarize and comment on the state of Open Data availability in Metro Government Departments from the previous year; (ii) provide a plan for the next year to improve online public access to Open Data and maintain data quality. The Open Data Management Team shall present an initial Open Data Report to the Mayor within 180 days of this Executive Order.(C) “Open Format” is any widely accepted, nonproprietary, platform-independent, machine-readable method for formatting data, which permits automated processing of such data and is accessible to external search capabilities.(D) “Open Data Portal” means the Internet site established and maintained by or on behalf of Metro Government, located at portal.louisvilleky.gov/service/data or its successor website.(E) “Open Data Management Team” means a group consisting of representatives from each Department within Metro Government and chaired by the Chief Information Officer (CIO) that is responsible for coordinating implementation of an Open Data Policy and creating the Open Data Report.(F) “Department” means any Metro Government department, office, administrative unit, commission, board, advisory committee, or other division of Metro Government within the official jurisdiction of the executive branch.Section 2. Open Data Portal.(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by Metro Government(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an Open Format.Section 3. Open Data Management Team.(A) The Chief Information Officer (CIO) of Louisville Metro Government will work with the head of each Department to identify a Data Coordinator in each Department. Data Coordinators will serve as members of an Open Data Management Team facilitated by the CIO and Metro Technology Services. The Open Data Management Team will work to establish a robust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data management policy that will adopt prevailing Open Format standards for Open Data, and develop agreements with regional partners to publish and maintain Open Data that is open and freely available while respecting exemptions allowed by the Kentucky Open Records Act or other federal or state law.Section 4. Department Open Data Catalogue.(A) Each Department shall be responsible for creating an Open Data catalogue, which will include comprehensive inventories of information possessed and/or managed by the Department.(B) Each Department’s Open Data catalogue will classify information holdings as currently “public” or “not yet public”; Departments will work with Metro Technology Services to develop strategies and timelines for publishing open data containing information in a way that is complete, reliable, and has a high level of detail.Section 5. Open Data Report and Policy Review.(A) Within one year of the effective date of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.(B) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy should be reviewed and considered for revisions or additions that will continue to position Metro Government as a leader on issues of openness, efficiency, and technical best practices.Section 6. This Executive Order shall take effect as of October 11, 2013.Signed this 11th day of October, 2013, by Greg Fischer, Mayor of Louisville/Jefferson County Metro Government.GREG FISCHER, MAYOR

  8. State Health IT Policy Levers

    • kaggle.com
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). State Health IT Policy Levers [Dataset]. https://www.kaggle.com/datasets/thedevastator/state-health-it-policy-levers
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    State Health IT Policy Levers

    300+ Examples of Advancing Interoperability and Promoting Health IT

    By US Open Data Portal, data.gov [source]

    About this dataset

    This dataset contains over 300 examples of health IT policy levers used by states to advance interoperability, promote health IT and support delivery system reform. The U.S Government's Office of National Coordinator for Health Information Technology (ONC) has curated this catalog as part of its Health IT State Policy Levers Compendium. It provides an exhaustive directory on the policy levers being utilized, along with information on the state enacting them and their official sources. This collection seeks to act as a comprehensive guide for government officials and healthcare providers who are interested in state-based initiatives for optimizing health information technology. Explore the strategies your own state might be using to unlock improved patient outcomes!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides information on policy levers used by various states in the United States to promote health IT and advance interoperability. The comprehensive list includes over 300 documented examples of health IT policy levers used by these states. This catalog can be used to identify which specific policy levers are being used, as well as what activities they are associated with.

    If you're interested in learning more about how states use health IT policy levers, this dataset is a great resource. It contains detailed information on each entry, including the state where it's being used, the status of that activity, a description of the activity and its purpose, and an official source for additional information about that particular entry.

    Using this data set is easy - simply search for specific states or find out which kinds of activities each state is using their health IT policy levers for. You can also look up any specific application or implementation detail from each record by opening up its corresponding source URL link . With all this information at hand you can better understand how states use their health IT tools to make a difference in advancing interoperability within healthcare systems today!

    Research Ideas

    • It can be used to provide states with potential models of successful health IT policy levers, allowing them to learn from the experiences of other states in developing and implementing health IT legislation.
    • The dataset can also be used by researchers looking to study the effectiveness of existing health care policy levers, as well as to identify any gaps that need to be filled in order for certain policies to have a greater overall impact.
    • Additionally, it could be used by industry stakeholders such as hospitals or other healthcare organizations for benchmarking their own efforts related to IT implementation, such as understanding what activities are being undertaken and which sources are being used for best practices or additional resources when making decisions related to new technology implementations into an organization's operations and services

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: policy-levers-activities-catalog-csv-1.csv | Column name | Description | |:-------------------------|:----------------------------------------------------------------------------------------------| | state | The state in which the policy lever is being used. (String) | | policy_lever | Type of policy lever being used. (String) | | activity_status | Status of activity (e.g., active or inactive). (String) | | activity_description | Description of activity. (String) | | source | Source from where data is gathered from. (String) | | source_url | A link that points directly back to an original sources with additional information. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit US Open Data Portal, data.gov.

  9. f

    Data from: Workflow for Evaluating Normalization Tools for Omics Data Using...

    • acs.figshare.com
    txt
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleesa E. Chua; Leah D. Pfeifer; Emily R. Sekera; Amanda B. Hummon; Heather Desaire (2023). Workflow for Evaluating Normalization Tools for Omics Data Using Supervised and Unsupervised Machine Learning [Dataset]. http://doi.org/10.1021/jasms.3c00295.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 28, 2023
    Dataset provided by
    ACS Publications
    Authors
    Aleesa E. Chua; Leah D. Pfeifer; Emily R. Sekera; Amanda B. Hummon; Heather Desaire
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    To achieve high quality omics results, systematic variability in mass spectrometry (MS) data must be adequately addressed. Effective data normalization is essential for minimizing this variability. The abundance of approaches and the data-dependent nature of normalization have led some researchers to develop open-source academic software for choosing the best approach. While these tools are certainly beneficial to the community, none of them meet all of the needs of all users, particularly users who want to test new strategies that are not available in these products. Herein, we present a simple and straightforward workflow that facilitates the identification of optimal normalization strategies using straightforward evaluation metrics, employing both supervised and unsupervised machine learning. The workflow offers a “DIY” aspect, where the performance of any normalization strategy can be evaluated for any type of MS data. As a demonstration of its utility, we apply this workflow on two distinct datasets, an ESI-MS dataset of extracted lipids from latent fingerprints and a cancer spheroid dataset of metabolites ionized by MALDI-MSI, for which we identified the best-performing normalization strategies.

  10. Quick Stats Agricultural Database

    • catalog.data.gov
    • datadiscoverystudio.org
    • +3more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Agricultural Statistics Service, Department of Agriculture (2025). Quick Stats Agricultural Database [Dataset]. https://catalog.data.gov/dataset/quick-stats-agricultural-database
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    National Agricultural Statistics Servicehttp://www.nass.usda.gov/
    Description

    Quick Stats is the National Agricultural Statistics Service's (NASS) online, self-service tool to access complete results from the 1997, 2002, 2007, and 2012 Censuses of Agriculture as well as the best source of NASS survey published estimates. The census collects data on all commodities produced on U.S. farms and ranches, as well as detailed information on expenses, income, and operator characteristics. The surveys that NASS conducts collect information on virtually every facet of U.S. agricultural production.

  11. FOI 30978 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Feb 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nhsbsa.net (2023). FOI 30978 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-30978
    Explore at:
    Dataset updated
    Feb 6, 2023
    Dataset provided by
    NHS Business Services Authority
    Description

    Under the Freedom of Information Act 2000, I was wondering if you would be able to develop on top of the FOI Request FOI 24442 and FOI 27689. https://opendata.nhsbsa.net/dataset/foi-24442 https://opendata.nhsbsa.net/dataset/foi-27689 The data in this request relates to April 2020 to March 2022 and April 2022 to June 2022 from the data source ‘NHSBSA Information Services Data Warehouse’ with the Columns YEAR_MONTH, PRACTICE_CODE, DISPENSER_CODE, BNF_CODE, PRODUCT_ORDER_NUMBER, PACK_ORDER_NUMBER and NIC_GBP. Would it be possible to have the data in the same format from July 2022 to December 2022 or from July 2022 to the latest possible month please?

  12. N

    Great Barrington, Massachusetts Population Breakdown by Gender and Age...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Great Barrington, Massachusetts Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1e3a77b-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Great Barrington, Massachusetts
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Great Barrington town by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Great Barrington town. The dataset can be utilized to understand the population distribution of Great Barrington town by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Great Barrington town. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Great Barrington town.

    Key observations

    Largest age group (population): Male # 40-44 years (385) | Female # 55-59 years (424). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the Great Barrington town population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Great Barrington town is shown in the following column.
    • Population (Female): The female population in the Great Barrington town is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Great Barrington town for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Great Barrington town Population by Gender. You can refer the same here

  13. Z

    Data from: Modelling of ready biodegradability based on combined public and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Van Miert, Erik (2020). Modelling of ready biodegradability based on combined public and industrial data sources [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3466618
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Horvath, Dragos
    Varnek, Alexandre
    Marcou, Gilles
    Van Miert, Erik
    Azam, Philippe
    Gantzer, Philippe
    Lunghini, Filippo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB). In-silico prediction is a valid alternative to expensive and time-consuming experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues.

    In this work we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (BA = 0.74 – 0.79) and data coverage (83 – 91 %).

    The Generative Topographic Mapping approach was employed to compare the chemical space of the various data sources: several chemotypes and structural motifs unique to the industrial dataset were identified, highlighting for which chemical classes currently available models may have less reliable predictions.

    Finally, public and industrial data were merged into Global dataset containing 3146 compounds and including a significant subset of compounds coming from the industrial context. This is the biggest dataset reported in the literature so far which covers some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has much larger applicability domain than related models built on publicly available data. The developed model is available for the user on the Laboratory of Chemoinformatics website.

    This dataset is only the "All-Public" set, since the industrial compounds cannot be disclosed.

    This update contains additional entries from [J. Chem. Inf. Model. 52 (2012), pp. 655–669] and [J. Chem. Inf. Model. 53 (2013), pp. 867–878]

  14. N

    Great Bend, ND Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Great Bend, ND Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b235eaca-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North Dakota, Great Bend
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Great Bend by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Great Bend across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of male population, with 64.29% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Great Bend is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Great Bend total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Great Bend Population by Race & Ethnicity. You can refer the same here

  15. G

    Great Lakes Basin Integrated Nutrient Dataset (2000-2019)

    • open.canada.ca
    csv, html, txt
    Updated Mar 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment and Climate Change Canada (2022). Great Lakes Basin Integrated Nutrient Dataset (2000-2019) [Dataset]. https://open.canada.ca/data/en/dataset/8eecfdf5-4fbc-43ec-a504-7e4ee41572eb
    Explore at:
    txt, csv, htmlAvailable download formats
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    Environment and Climate Change Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Oct 1, 1999 - Oct 1, 2019
    Area covered
    The Great Lakes
    Description

    The Great Lakes Basin Integrated Nutrient Dataset compiles and standardizes phosphorus, nitrogen, and suspended solids data collected between the 2000-2019 water years from multiple Canadian and American sources around the Great Lakes. Ultimately, the goal is to enable regional nutrient data analysis within the Great Lakes Basin. This data is not directly used in the Water Quality Monitoring and Surveillance Division tributary load calculations. Data processing steps include standardizing data column and nutrient names, date-time conversion to Universal Time Coordinates, normalizing concentration units to milligram per liter, and reporting all phosphorus and nitrogen compounds 'as phosphorus' or 'as nitrogen'. Data sources include the Environment and Climate Change Canada National Long-term Water Quality Monitoring Data (WQMS), the Provincial (Stream) Water Quality Monitoring Network (PWQMN) of the Ontario Ministry of the Environment, the Grand River Conservation Authority (GRCA) water quality data, and Heidelberg University’s National Center for Water Quality Research (NCWQR) Tributary Loading Program.

  16. Data from: Classification of Mars Terrain Using Multiple Data Sources

    • data.nasa.gov
    • datasets.ai
    • +3more
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.

  17. d

    California Vegetation - WHRTYPE

    • catalog.data.gov
    • data.ca.gov
    • +5more
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAL FIRE (2024). California Vegetation - WHRTYPE [Dataset]. https://catalog.data.gov/dataset/california-vegetation-whrtype-ae8dc
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    CAL FIRE
    Area covered
    California
    Description

    An accurate depiction of the spatial distribution of habitat types within California is required for a variety of legislatively-mandated government functions. The California Department of Forestry and Fire Protection's CALFIRE Fire and Resource Assessment Program (FRAP), in cooperation with California Department of Fish and Wildlife VegCamp program and extensive use of USDA Forest Service Region 5 Remote Sensing Laboratory (RSL) data, has compiled the "best available" land cover data available for California into a single comprehensive statewide data set. The data span a period from approximately 1990+. Typically the most current, detailed and consistent data were collected for various regions of the state. Decision rules were developed that controlled which layers were given priority in areas of overlap. Cross-walks were used to compile the various sources into the common classification scheme, the California Wildlife Habitat Relationships (CWHR) system. This service depicts the WHRTYPE description from the fveg dataset (Wildlife Habitat Relationship classes).The full dataset can be downloaded in raster format here: GIS Mapping and Data Analytics | CAL FIREThe service represents the latest release of the data, and is updated when a new version is released. Currently it represents fveg15_1.

  18. s

    Table F Annual Budget 2024 SDCC - Dataset - data.smartdublin.ie

    • data.smartdublin.ie
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Table F Annual Budget 2024 SDCC - Dataset - data.smartdublin.ie [Dataset]. https://data.smartdublin.ie/dataset/table-f-annual-budget-2024-sdcc1
    Explore at:
    Dataset updated
    Jan 5, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Table F is the Expenditure and Income for the Budget Year and Estimated Outturn for the previous Year. It contains –‘Expenditure’ and ‘Income’ Adopted by the Council for the Budget Year; 'Expenditure’ and ‘Income’ Estimated by the Chief Executive for the Budget Year; 'Expenditure’ and ‘Income’ Adopted by the Council for the previous Year; ‘Expenditure’ and ‘Income’ Estimated Outturn for the previous Year. Table F provides a breakdown of the Expenditure to Sub-Service level and Income to Income Source per Council Division contained in Table A.In the published Annual Budget document, Table F is published as a separate table for each Division.Section 1 of Table F contains Expenditure broken down by ‘Division’, ‘Service’ and ‘Sub-Service’. Section 2 of Table F contains Income broken down by ‘Division’, ‘Income Type’ and ‘Income Source’. The data in this dataset is best interpreted by comparison with Table F in the published Annual Budget document which can be found at https://www.sdcc.ie/en/services/our-council/policies-and-plans/budgets-and-spending/annual-budget/Data fields for Table F are as follows –Doc : Table Reference Heading : Indicates sections in the Table - Table F is comprised of two sections : Income and Expenditure. Heading = 1 for all Expenditure records; Heading = 2 for all Income records. Ref : Division Reference Ref_Desc : Division Description Ref1 : Service Reference for all Expenditure records (i.e. Heading = 1) or Income Type for all Income records (i.e. Heading = 2) Ref1_Desc : Service Description for all Expenditure records (i.e. Heading = 1) or Income Type for all Income records (i.e. Heading = 2) Ref2 : Sub-Service Reference for all Expenditure records (i.e. Heading = 1) or Income Source for all Income records (i.e. Heading = 2) Ref2_Desc : Sub-Service Description for all Expenditure records (i.e. Heading = 1) or Income Source for all Income records (i.e. Heading = 2) Adop : Amount Adopted by Council for Budget Year EstCE : Amount Estimated by Chief Executive for Budget Year PY_Adop : Amount Adopted by Council for previous Financial Year PY_Outturn : Amount Estimated Outturn for previous Financial Year

  19. N

    Landcover Raster Data (2010) – 3ft Resolution

    • data.cityofnewyork.us
    • catalog.data.gov
    • +2more
    application/rdfxml +5
    Updated Jun 28, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Parks and Recreation (DPR) (2012). Landcover Raster Data (2010) – 3ft Resolution [Dataset]. https://data.cityofnewyork.us/Environment/Landcover-Raster-Data-2010-3ft-Resolution/9auy-76zt
    Explore at:
    csv, tsv, json, application/rdfxml, application/rssxml, xmlAvailable download formats
    Dataset updated
    Jun 28, 2012
    Dataset authored and provided by
    Department of Parks and Recreation (DPR)
    Description

    High resolution land cover data set for New York City. This is the 3ft version of the high-resolution land cover dataset for New York City. Seven land cover classes were mapped: (1) tree canopy, (2) grass/shrub, (3) bare earth, (4) water, (5) buildings, (6) roads, and (7) other paved surfaces. The minimum mapping unit for the delineation of features was set at 3 square feet. The primary sources used to derive this land cover layer were the 2010 LiDAR and the 2008 4-band orthoimagery. Ancillary data sources included GIS data (city boundary, building footprints, water, parking lots, roads, railroads, railroad structures, ballfields) provided by New York City (all ancillary datasets except railroads); UVM Spatial Analysis Laboratory manually created railroad polygons from manual interpretation of 2008 4-band orthoimagery. The tree canopy class was considered current as of 2010; the remaining land-cover classes were considered current as of 2008. Object-Based Image Analysis (OBIA) techniques were employed to extract land cover information using the best available remotely sensed and vector GIS datasets. OBIA systems work by grouping pixels into meaningful objects based on their spectral and spatial properties, while taking into account boundaries imposed by existing vector datasets. Within the OBIA environment a rule-based expert system was designed to effectively mimic the process of manual image analysis by incorporating the elements of image interpretation (color/tone, texture, pattern, location, size, and shape) into the classification process. A series of morphological procedures were employed to insure that the end product is both accurate and cartographically pleasing. More than 35,000 corrections were made to the classification. Overall accuracy was 96%. This dataset was developed as part of the Urban Tree Canopy (UTC) Assessment for New York City. As such, it represents a 'top down' mapping perspective in which tree canopy over hanging other features is assigned to the tree canopy class. At the time of its creation this dataset represents the most detailed and accurate land cover dataset for the area. This project was funded by National Urban and Community Forestry Advisory Council (NUCFAC) and the National Science Fundation (NSF), although it is not specifically endorsed by either agency. The methods used were developed by the University of Vermont Spatial Analysis Laboratory, in collaboration with the New York City Urban Field Station, with funding from the USDA Forest Service.

  20. Job Offers Web Scraping Search

    • kaggle.com
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Offers Web Scraping Search

    Targeted Results to Find the Optimal Work Solution

    By [source]

    About this dataset

    This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

    • Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

    • Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

    • Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

    • Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

      All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

    Research Ideas

    • Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
    • The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
    • It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
Organization logoOrganization logo

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

Related Article
Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE

Search
Clear search
Close search
Google apps
Main menu