Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.
GitHub page: https://github.com/soarsmu/NICHE
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
SEPAL (https://sepal.io/) is a free and open source cloud computing platform for geo-spatial data access and processing. It empowers users to quickly process large amounts of data on their computer or mobile device. Users can create custom analysis ready data using freely available satellite imagery, generate and improve land use maps, analyze time series, run change detection and perform accuracy assessment and area estimation, among many other functionalities in the platform. Data can be created and analyzed for any place on Earth using SEPAL.
https://data.apps.fao.org/catalog/dataset/9c4d7c45-7620-44c4-b653-fbe13eb34b65/resource/63a3efa0-08ab-4ad6-9d4a-96af7b6a99ec/download/cambodia_mosaic_2020.png" alt="alt text" title="Figure 1: Best pixel mosaic of Landsat 8 data for 2020 over Cambodia">
SEPAL reaches over 5000 users in 180 countries for the creation of custom data products from freely available satellite data. SEPAL was developed as a part of the Open Foris suite, a set of free and open source software platforms and tools that facilitate flexible and efficient data collection, analysis and reporting. SEPAL combines and integrates modern geospatial data infrastructures and supercomputing power available through Google Earth Engine and Amazon Web Services with powerful open-source data processing software, such as R, ORFEO, GDAL, Python and Jupiter Notebooks. Users can easily access the archive of satellite imagery from NASA, the European Space Agency (ESA) as well as high spatial and temporal resolution data from Planet Labs and turn such images into data that can be used for reporting and better decision making.
National Forest Monitoring Systems in many countries have been strengthened by SEPAL, which provides technical government staff with computing resources and cutting edge technology to accurately map and monitor their forests. The platform was originally developed for monitoring forest carbon stock and stock changes for reducing emissions from deforestation and forest degradation (REDD+). The application of the tools on the platform now reach far beyond forest monitoring by providing different stakeholders access to cloud based image processing tools, remote sensing and machine learning for any application. Presently, users work on SEPAL for various applications related to land monitoring, land cover/use, land productivity, ecological zoning, ecosystem restoration monitoring, forest monitoring, near real time alerts for forest disturbances and fire, flood mapping, mapping impact of disasters, peatland rewetting status, and many others.
The Hand-in-Hand initiative enables countries that generate data through SEPAL to disseminate their data widely through the platform and to combine their data with the numerous other datasets available through Hand-in-Hand.
https://data.apps.fao.org/catalog/dataset/9c4d7c45-7620-44c4-b653-fbe13eb34b65/resource/868e59da-47b9-4736-93a9-f8d83f5731aa/download/probability_classification_over_zambia.png" alt="alt text" title="Figure 2: Image classification module for land monitoring and mapping. Probability classification over Zambia">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about news. It has 1 row and is filtered where the keywords includes Best books. It features 2 columns including news link.
Breakdown of the Top 10 sources of tax revenue in the State of Oklahoma by broad category, other than sales and income taxes.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Open-Source Database Software Market size was valued at USD 10.00 Billion in 2024 and is projected to reach USD 35.83 Billion by 2032, growing at a CAGR of 20% during the forecast period 2026-2032.
Global Open-Source Database Software Market Drivers
The market drivers for the Open-Source Database Software Market can be influenced by various factors. These may include:
Cost-Effectiveness: Compared to proprietary systems, open-source databases frequently have lower initial expenses, which attracts organizations—especially startups and small to medium-sized enterprises (SMEs) with tight budgets. Flexibility and Customisation: Open-source databases provide more possibilities for customization and flexibility, enabling businesses to modify the database to suit their unique needs and grow as necessary. Collaboration and Community Support: Active developer communities that share best practices, support, and contribute to the continued development of open-source databases are beneficial. This cooperative setting can promote quicker problem solving and innovation. Performance and Scalability: A lot of open-source databases are made to scale horizontally across several nodes, which helps businesses manage expanding data volumes and keep up performance levels as their requirements change. Data Security and Sovereignty: Open-source databases provide businesses more control over their data and allow them to decide where to store and use it, which helps to allay worries about compliance and data sovereignty. Furthermore, open-source code openness can improve security by making it simpler to find and fix problems. Compatibility with Contemporary Technologies: Open-source databases are well-suited for contemporary application development and deployment techniques like microservices, containers, and cloud-native architectures since they frequently support a broad range of programming languages, frameworks, and platforms. Growing Cloud Computing Adoption: Open-source databases offer a flexible and affordable solution for managing data in cloud environments, whether through self-managed deployments or via managed database services provided by cloud providers. This is because more and more organizations are moving their workloads to the cloud. Escalating Need for Real-Time Insights and Analytics: Organizations are increasingly adopting open-source databases with integrated analytics capabilities, like NoSQL and NewSQL databases, as a means of instantly obtaining actionable insights from their data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set combines the best available roads data by country into a global roads coverage, using the UN Spatial Data Infrastructure Transport (UNSDI-T) version 2 as a common data model. All country road networks have been joined topologically at the borders, and many countries have been edited for internal topology. Source data for each country are provided in the documentation, and users are encouraged to refer to the readme file for use constraints that apply to a small number of countries. Because the data are compiled from multiple sources, the date range for road network representations ranges from the 1980s to 2010 depending on the country (most countries have no confirmed date), and spatial accuracy varies. The baseline global data set was compiled by the Information Technology Outreach Services (ITOS) of the University of Georgia. Updated data for 27 countries and 6 smaller geographic entities were assembled by Columbia University's Center for International Earth Science Information Network (CIESIN), with a focus largely on developing countries with the poorest data coverage.
https://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-licensehttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-license
On October 15, 2013, Louisville Mayor Greg Fischer announced the signing of an open data policy executive order in conjunction with his compelling talk at the 2013 Code for America Summit. In nonchalant cadence, the mayor announced his support for complete information disclosure by declaring, "It's data, man."Sunlight Foundation - New Louisville Open Data Policy Insists Open By Default is the Future Open Data Annual ReportsSection 5.A. Within one year of the effective Data of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously (2014-16) it was led by the Director of IT.Full Executive OrderEXECUTIVE ORDER NO. 1, SERIES 2013AN EXECUTIVE ORDERCREATING AN OPEN DATA PLAN. WHEREAS, Metro Government is the catalyst for creating a world-class city that provides its citizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovation, and a high quality of life; andWHEREAS, it should be easy to do business with Metro Government. Online government interactions mean more convenient services for citizens and businesses and online government interactions improve the cost effectiveness and accuracy of government operations; andWHEREAS, an open government also makes certain that every aspect of the built environment also has reliable digital descriptions available to citizens and entrepreneurs for deep engagement mediated by smart devices; andWHEREAS, every citizen has the right to prompt, efficient service from Metro Government; andWHEREAS, the adoption of open standards improves transparency, access to public information and improved coordination and efficiencies among Departments and partner organizations across the public, nonprofit and private sectors; andWHEREAS, by publishing structured standardized data in machine readable formats the Louisville Metro Government seeks to encourage the local software community to develop software applications and tools to collect, organize, and share public record data in new and innovative ways; andWHEREAS, in commitment to the spirit of Open Government, Louisville Metro Government will consider public information to be open by default and will proactively publish data and data containing information, consistent with the Kentucky Open Meetings and Open Records Act; andNOW, THEREFORE, BE IT PROMULGATED BY EXECUTIVE ORDER OF THE HONORABLE GREG FISCHER, MAYOR OF LOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:Section 1. Definitions. As used in this Executive Order, the terms below shall have the following definitions:(A) “Open Data” means any public record as defined by the Kentucky Open Records Act, which could be made available online using Open Format data, as well as best practice Open Data structures and formats when possible. Open Data is not information that is treated exempt under KRS 61.878 by Metro Government.(B) “Open Data Report” is the annual report of the Open Data Management Team, which shall (i) summarize and comment on the state of Open Data availability in Metro Government Departments from the previous year; (ii) provide a plan for the next year to improve online public access to Open Data and maintain data quality. The Open Data Management Team shall present an initial Open Data Report to the Mayor within 180 days of this Executive Order.(C) “Open Format” is any widely accepted, nonproprietary, platform-independent, machine-readable method for formatting data, which permits automated processing of such data and is accessible to external search capabilities.(D) “Open Data Portal” means the Internet site established and maintained by or on behalf of Metro Government, located at portal.louisvilleky.gov/service/data or its successor website.(E) “Open Data Management Team” means a group consisting of representatives from each Department within Metro Government and chaired by the Chief Information Officer (CIO) that is responsible for coordinating implementation of an Open Data Policy and creating the Open Data Report.(F) “Department” means any Metro Government department, office, administrative unit, commission, board, advisory committee, or other division of Metro Government within the official jurisdiction of the executive branch.Section 2. Open Data Portal.(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by Metro Government(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an Open Format.Section 3. Open Data Management Team.(A) The Chief Information Officer (CIO) of Louisville Metro Government will work with the head of each Department to identify a Data Coordinator in each Department. Data Coordinators will serve as members of an Open Data Management Team facilitated by the CIO and Metro Technology Services. The Open Data Management Team will work to establish a robust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data management policy that will adopt prevailing Open Format standards for Open Data, and develop agreements with regional partners to publish and maintain Open Data that is open and freely available while respecting exemptions allowed by the Kentucky Open Records Act or other federal or state law.Section 4. Department Open Data Catalogue.(A) Each Department shall be responsible for creating an Open Data catalogue, which will include comprehensive inventories of information possessed and/or managed by the Department.(B) Each Department’s Open Data catalogue will classify information holdings as currently “public” or “not yet public”; Departments will work with Metro Technology Services to develop strategies and timelines for publishing open data containing information in a way that is complete, reliable, and has a high level of detail.Section 5. Open Data Report and Policy Review.(A) Within one year of the effective date of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.(B) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy should be reviewed and considered for revisions or additions that will continue to position Metro Government as a leader on issues of openness, efficiency, and technical best practices.Section 6. This Executive Order shall take effect as of October 11, 2013.Signed this 11th day of October, 2013, by Greg Fischer, Mayor of Louisville/Jefferson County Metro Government.GREG FISCHER, MAYOR
By US Open Data Portal, data.gov [source]
This dataset contains over 300 examples of health IT policy levers used by states to advance interoperability, promote health IT and support delivery system reform. The U.S Government's Office of National Coordinator for Health Information Technology (ONC) has curated this catalog as part of its Health IT State Policy Levers Compendium. It provides an exhaustive directory on the policy levers being utilized, along with information on the state enacting them and their official sources. This collection seeks to act as a comprehensive guide for government officials and healthcare providers who are interested in state-based initiatives for optimizing health information technology. Explore the strategies your own state might be using to unlock improved patient outcomes!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides information on policy levers used by various states in the United States to promote health IT and advance interoperability. The comprehensive list includes over 300 documented examples of health IT policy levers used by these states. This catalog can be used to identify which specific policy levers are being used, as well as what activities they are associated with.
If you're interested in learning more about how states use health IT policy levers, this dataset is a great resource. It contains detailed information on each entry, including the state where it's being used, the status of that activity, a description of the activity and its purpose, and an official source for additional information about that particular entry.
Using this data set is easy - simply search for specific states or find out which kinds of activities each state is using their health IT policy levers for. You can also look up any specific application or implementation detail from each record by opening up its corresponding source URL link . With all this information at hand you can better understand how states use their health IT tools to make a difference in advancing interoperability within healthcare systems today!
- It can be used to provide states with potential models of successful health IT policy levers, allowing them to learn from the experiences of other states in developing and implementing health IT legislation.
- The dataset can also be used by researchers looking to study the effectiveness of existing health care policy levers, as well as to identify any gaps that need to be filled in order for certain policies to have a greater overall impact.
- Additionally, it could be used by industry stakeholders such as hospitals or other healthcare organizations for benchmarking their own efforts related to IT implementation, such as understanding what activities are being undertaken and which sources are being used for best practices or additional resources when making decisions related to new technology implementations into an organization's operations and services
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: policy-levers-activities-catalog-csv-1.csv | Column name | Description | |:-------------------------|:----------------------------------------------------------------------------------------------| | state | The state in which the policy lever is being used. (String) | | policy_lever | Type of policy lever being used. (String) | | activity_status | Status of activity (e.g., active or inactive). (String) | | activity_description | Description of activity. (String) | | source | Source from where data is gathered from. (String) | | source_url | A link that points directly back to an original sources with additional information. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit US Open Data Portal, data.gov.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To achieve high quality omics results, systematic variability in mass spectrometry (MS) data must be adequately addressed. Effective data normalization is essential for minimizing this variability. The abundance of approaches and the data-dependent nature of normalization have led some researchers to develop open-source academic software for choosing the best approach. While these tools are certainly beneficial to the community, none of them meet all of the needs of all users, particularly users who want to test new strategies that are not available in these products. Herein, we present a simple and straightforward workflow that facilitates the identification of optimal normalization strategies using straightforward evaluation metrics, employing both supervised and unsupervised machine learning. The workflow offers a “DIY” aspect, where the performance of any normalization strategy can be evaluated for any type of MS data. As a demonstration of its utility, we apply this workflow on two distinct datasets, an ESI-MS dataset of extracted lipids from latent fingerprints and a cancer spheroid dataset of metabolites ionized by MALDI-MSI, for which we identified the best-performing normalization strategies.
Quick Stats is the National Agricultural Statistics Service's (NASS) online, self-service tool to access complete results from the 1997, 2002, 2007, and 2012 Censuses of Agriculture as well as the best source of NASS survey published estimates. The census collects data on all commodities produced on U.S. farms and ranches, as well as detailed information on expenses, income, and operator characteristics. The surveys that NASS conducts collect information on virtually every facet of U.S. agricultural production.
Under the Freedom of Information Act 2000, I was wondering if you would be able to develop on top of the FOI Request FOI 24442 and FOI 27689. https://opendata.nhsbsa.net/dataset/foi-24442 https://opendata.nhsbsa.net/dataset/foi-27689 The data in this request relates to April 2020 to March 2022 and April 2022 to June 2022 from the data source ‘NHSBSA Information Services Data Warehouse’ with the Columns YEAR_MONTH, PRACTICE_CODE, DISPENSER_CODE, BNF_CODE, PRODUCT_ORDER_NUMBER, PACK_ORDER_NUMBER and NIC_GBP. Would it be possible to have the data in the same format from July 2022 to December 2022 or from July 2022 to the latest possible month please?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Great Barrington town by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Great Barrington town. The dataset can be utilized to understand the population distribution of Great Barrington town by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Great Barrington town. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Great Barrington town.
Key observations
Largest age group (population): Male # 40-44 years (385) | Female # 55-59 years (424). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Great Barrington town Population by Gender. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB). In-silico prediction is a valid alternative to expensive and time-consuming experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues.
In this work we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (BA = 0.74 – 0.79) and data coverage (83 – 91 %).
The Generative Topographic Mapping approach was employed to compare the chemical space of the various data sources: several chemotypes and structural motifs unique to the industrial dataset were identified, highlighting for which chemical classes currently available models may have less reliable predictions.
Finally, public and industrial data were merged into Global dataset containing 3146 compounds and including a significant subset of compounds coming from the industrial context. This is the biggest dataset reported in the literature so far which covers some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has much larger applicability domain than related models built on publicly available data. The developed model is available for the user on the Laboratory of Chemoinformatics website.
This dataset is only the "All-Public" set, since the industrial compounds cannot be disclosed.
This update contains additional entries from [J. Chem. Inf. Model. 52 (2012), pp. 655–669] and [J. Chem. Inf. Model. 53 (2013), pp. 867–878]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Great Bend by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Great Bend across both sexes and to determine which sex constitutes the majority.
Key observations
There is a majority of male population, with 64.29% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Great Bend Population by Race & Ethnicity. You can refer the same here
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The Great Lakes Basin Integrated Nutrient Dataset compiles and standardizes phosphorus, nitrogen, and suspended solids data collected between the 2000-2019 water years from multiple Canadian and American sources around the Great Lakes. Ultimately, the goal is to enable regional nutrient data analysis within the Great Lakes Basin. This data is not directly used in the Water Quality Monitoring and Surveillance Division tributary load calculations. Data processing steps include standardizing data column and nutrient names, date-time conversion to Universal Time Coordinates, normalizing concentration units to milligram per liter, and reporting all phosphorus and nitrogen compounds 'as phosphorus' or 'as nitrogen'. Data sources include the Environment and Climate Change Canada National Long-term Water Quality Monitoring Data (WQMS), the Provincial (Stream) Water Quality Monitoring Network (PWQMN) of the Ontario Ministry of the Environment, the Grand River Conservation Authority (GRCA) water quality data, and Heidelberg University’s National Center for Water Quality Research (NCWQR) Tributary Loading Program.
Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.
An accurate depiction of the spatial distribution of habitat types within California is required for a variety of legislatively-mandated government functions. The California Department of Forestry and Fire Protection's CALFIRE Fire and Resource Assessment Program (FRAP), in cooperation with California Department of Fish and Wildlife VegCamp program and extensive use of USDA Forest Service Region 5 Remote Sensing Laboratory (RSL) data, has compiled the "best available" land cover data available for California into a single comprehensive statewide data set. The data span a period from approximately 1990+. Typically the most current, detailed and consistent data were collected for various regions of the state. Decision rules were developed that controlled which layers were given priority in areas of overlap. Cross-walks were used to compile the various sources into the common classification scheme, the California Wildlife Habitat Relationships (CWHR) system. This service depicts the WHRTYPE description from the fveg dataset (Wildlife Habitat Relationship classes).The full dataset can be downloaded in raster format here: GIS Mapping and Data Analytics | CAL FIREThe service represents the latest release of the data, and is updated when a new version is released. Currently it represents fveg15_1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Table F is the Expenditure and Income for the Budget Year and Estimated Outturn for the previous Year. It contains –‘Expenditure’ and ‘Income’ Adopted by the Council for the Budget Year; 'Expenditure’ and ‘Income’ Estimated by the Chief Executive for the Budget Year; 'Expenditure’ and ‘Income’ Adopted by the Council for the previous Year; ‘Expenditure’ and ‘Income’ Estimated Outturn for the previous Year. Table F provides a breakdown of the Expenditure to Sub-Service level and Income to Income Source per Council Division contained in Table A.In the published Annual Budget document, Table F is published as a separate table for each Division.Section 1 of Table F contains Expenditure broken down by ‘Division’, ‘Service’ and ‘Sub-Service’. Section 2 of Table F contains Income broken down by ‘Division’, ‘Income Type’ and ‘Income Source’. The data in this dataset is best interpreted by comparison with Table F in the published Annual Budget document which can be found at https://www.sdcc.ie/en/services/our-council/policies-and-plans/budgets-and-spending/annual-budget/Data fields for Table F are as follows –Doc : Table Reference Heading : Indicates sections in the Table - Table F is comprised of two sections : Income and Expenditure. Heading = 1 for all Expenditure records; Heading = 2 for all Income records. Ref : Division Reference Ref_Desc : Division Description Ref1 : Service Reference for all Expenditure records (i.e. Heading = 1) or Income Type for all Income records (i.e. Heading = 2) Ref1_Desc : Service Description for all Expenditure records (i.e. Heading = 1) or Income Type for all Income records (i.e. Heading = 2) Ref2 : Sub-Service Reference for all Expenditure records (i.e. Heading = 1) or Income Source for all Income records (i.e. Heading = 2) Ref2_Desc : Sub-Service Description for all Expenditure records (i.e. Heading = 1) or Income Source for all Income records (i.e. Heading = 2) Adop : Amount Adopted by Council for Budget Year EstCE : Amount Estimated by Chief Executive for Budget Year PY_Adop : Amount Adopted by Council for previous Financial Year PY_Outturn : Amount Estimated Outturn for previous Financial Year
High resolution land cover data set for New York City. This is the 3ft version of the high-resolution land cover dataset for New York City. Seven land cover classes were mapped: (1) tree canopy, (2) grass/shrub, (3) bare earth, (4) water, (5) buildings, (6) roads, and (7) other paved surfaces. The minimum mapping unit for the delineation of features was set at 3 square feet. The primary sources used to derive this land cover layer were the 2010 LiDAR and the 2008 4-band orthoimagery. Ancillary data sources included GIS data (city boundary, building footprints, water, parking lots, roads, railroads, railroad structures, ballfields) provided by New York City (all ancillary datasets except railroads); UVM Spatial Analysis Laboratory manually created railroad polygons from manual interpretation of 2008 4-band orthoimagery. The tree canopy class was considered current as of 2010; the remaining land-cover classes were considered current as of 2008. Object-Based Image Analysis (OBIA) techniques were employed to extract land cover information using the best available remotely sensed and vector GIS datasets. OBIA systems work by grouping pixels into meaningful objects based on their spectral and spatial properties, while taking into account boundaries imposed by existing vector datasets. Within the OBIA environment a rule-based expert system was designed to effectively mimic the process of manual image analysis by incorporating the elements of image interpretation (color/tone, texture, pattern, location, size, and shape) into the classification process. A series of morphological procedures were employed to insure that the end product is both accurate and cartographically pleasing. More than 35,000 corrections were made to the classification. Overall accuracy was 96%. This dataset was developed as part of the Urban Tree Canopy (UTC) Assessment for New York City. As such, it represents a 'top down' mapping perspective in which tree canopy over hanging other features is assigned to the tree canopy class. At the time of its creation this dataset represents the most detailed and accurate land cover dataset for the area. This project was funded by National Urban and Community Forestry Advisory Council (NUCFAC) and the National Science Fundation (NSF), although it is not specifically endorsed by either agency. The methods used were developed by the University of Vermont Spatial Analysis Laboratory, in collaboration with the New York City Urban Field Station, with funding from the USDA Forest Service.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:
Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.
Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!
Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!
Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!
All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!
- Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
- The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
- It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.
GitHub page: https://github.com/soarsmu/NICHE