Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such “rare features” has received little attention despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers. Supplementary materials for this article are available online.
Multiple research and management partners collaboratively developed a multiscale approach for assessing the geomorphic sensitivity of streams and ecological resilience of riparian and meadow ecosystems in upland watersheds of the Great Basin to disturbances and management actions. The approach builds on long-term work by the partners on the responses of these systems to disturbances and management actions. At the core of the assessments is information on past and present watershed and stream channel characteristics, geomorphic and hydrologic processes, and riparian and meadow vegetation. In this report, we describe the approach used to delineate Great Basin mountain ranges and the watersheds within them, and the data that are available for the individual watersheds. We also describe the resulting database and the data sources. Furthermore, we summarize information on the characteristics of the regions and watersheds within the regions and the implications of the assessments for geomorphic sensitivity and ecological resilience. The target audience for this multiscale approach is managers and stakeholders interested in assessing and adaptively managing Great Basin stream systems and riparian and meadow ecosystems. Anyone interested in delineating the mountain ranges and watersheds within the Great Basin or quantifying the characteristics of the watersheds will be interested in this report. For more information, visit: https://www.fs.usda.gov/research/treesearch/61573Metadata and Downloads
A combination of all park features, events, recreations, facilities, all in one layer, including Activenet information.
http://schema.org/PublicDomainhttp://schema.org/PublicDomain
Comparison matrix showing feature availability across different AI visibility platforms
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The hydrographic features of the CanVec series include watercourses, water linear flow segments, hydrographic obstacles (falls, rapids, etc.), waterbodies (lakes, watercourses, etc.), permanent snow and ice features, water wells and springs. The Hydrographic features theme provides quality vector geospatial data (current, accurate, and consistent) of Canadian hydrographic phenomena. It aims to offer a geometric description and a set of basic attributes on hydrographic features that comply with international geomatics standards, seamlessly across Canada. The CanVec multiscale series is available as prepackaged downloadable files and by user-defined extent via a Geospatial data extraction tool. Related Products: Topographic Data of Canada - CanVec Series
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.
Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.
Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!
https://borealisdata.ca/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.5683/SP3/UPABVHhttps://borealisdata.ca/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.5683/SP3/UPABVH
Data collected from major Canadian and international research data repositories cover data storage, preservation, metadata, interchange, data file types, and other standard features used in the retention and sharing of research data. The outputs of this project primarily aim to assist in the establishment of recommended minimum requirements for a Canadian research data infrastructure. The committee also aims to further develop guidelines and criteria for the assessment and selection o f repositories for deposit of Canadian research data by researchers, data managers, librarians, archivists etc.
The Geopspatial Fabric provides a consistent, documented, and topologically connected set of spatial features that create an abstracted stream/basin network of features useful for hydrologic modeling.The GIS vector features contained in this Geospatial Fabric (GF) data set cover the lower 48 U.S. states, Hawaii, and Puerto Rico. Four GIS feature classes are provided for each Region: 1) the Region outline ("one"), 2) Points of Interest ("POIs"), 3) a routing network ("nsegment"), and 4) Hydrologic Response Units ("nhru"). A graphic showing the boundaries for all Regions is provided at http://dx.doi.org/doi:10.5066/F7542KMD. These Regions are identical to those used to organize the NHDPlus v.1 dataset (US EPA and US Geological Survey, 2005). Although the GF Feature data set has been derived from NHDPlus v.1, it is an entirely new data set that has been designed to generically support regional and national scale applications of hydrologic models. Definition of each type of feature class and its derivation is provided within the
This study compared three versions of Go/No-Go (GNG) task, each with different gamelike features (non-game, points, theme) across two different testing sites (laboratory and online). We used a between subjects design, with reaction times (RT) on Go trials, Go trial accuracy, No-Go trial accuracy and subjective ratings as the dependent variables of interest.
24K Hydro File Geodatabase, including bank lines, flow lines, junction points, hydro lines, water bodies, hydro points, and a network. Access the user guide, data dictionaries, and metadata below.The DNR Hydrography database was developed statewide using several 1:24,000-scale sources. This data layer includes information about surface water features represented on the USGS 1:24,000-scale topographic map series such as perennial and intermittent streams, lakes, etc. Because the sources of the Hydrography data span many years and originate from several sources, the data may reflect areas of transition from one source to another. As a result, the water features as represented in the Hydrography data may not always match what you see on a particular USGS quad or Digital Raster Graphic (DRG). General source information is presented on this map: Wisconsin Hydrography Source Information. Note: Wetlands delineations are not included in the DNR Hydrography data layer. For information about DNR Wetlands data, see the Wisconsin Wetland Inventory web page.Report errors in this data to Dennis Wiese (dennis.wiese@wisconsin.gov) with the following information:HYDROID of the feature in question; OR if the feature is missing, a location coordinate or description (e.g. latitude/longitude, Public Land Survey System Township, Range, and Section identifier) that identifies the area in question.Optional but very helpful: a screen capture of the area in question, or the Water Body Identification Code (WBIC) of the feature in question.DNR staff can access the hydrography database in the agency's central GIS data repository. The hydrography feature classes are stored in the feature dataset "W23324.WD_HYDRO_DATA_24K".USER GUIDES AND DOCUMENTATION: WDNR_HYDRO_24k_GETTING STARTED WDNR HYDRO 24K UPDATES DOCUMENT 24K HYDRO DECISION RULESData Dictionaries and Metadata WDNR_HYDRO_24k_waterbody_data_dict WDNR_HYDRO_24k_waterbody_metadata WDNR_HYDRO_24k_flowline_data_dict WDNR_HYDRO_24k_flowline_metadata WDNR_HYDRO_24k_bank_data_dict WDNR_HYDRO_24k_bank_metadata WDNR_HYDRO_24k_junction_data_dict WDNR_HYDRO_24k_junction_metadata WDNR_HYDRO_24k_line_data_dict WDNR_HYDRO_24k_line_metadata WDNR_HYDRO_24k_flowline_wbic_data_dict WDNR_HYDRO_24k_flowline_wbic_metadata WDNR_HYDRO_24k_waterbody_wbic_data_dict WDNR_HYDRO_24k_waterbody_wbic_metadataArcMap Layer (.lyr) Files 24k Hydro Flowline Duration 24k Hydro Bank Lines 24k Hydro Flowline Streams 24k Hydro Waterbody Open Water
This data set consists of 6 classes of zoning features: zoning districts, special purpose districts, special purpose district subdistricts, limited height districts, commercial overlay districts, and zoning map amendments.
All previously released versions of this data are available at BYTES of the BIG APPLE - Archive.
The location of Landmark Features within Nottingham City Centre. Landmark Features are points of local interest and significance within the townscape.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset contains a list of features for each park PMAID.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.
The dataset contains 45,000 records and 14 variables, each described below:
Column | Description | Type |
---|---|---|
person_age | Age of the person | Float |
person_gender | Gender of the person | Categorical |
person_education | Highest education level | Categorical |
person_income | Annual income | Float |
person_emp_exp | Years of employment experience | Integer |
person_home_ownership | Home ownership status (e.g., rent, own, mortgage) | Categorical |
loan_amnt | Loan amount requested | Float |
loan_intent | Purpose of the loan | Categorical |
loan_int_rate | Loan interest rate | Float |
loan_percent_income | Loan amount as a percentage of annual income | Float |
cb_person_cred_hist_length | Length of credit history in years | Float |
credit_score | Credit score of the person | Integer |
previous_loan_defaults_on_file | Indicator of previous loan defaults | Categorical |
loan_status (target variable) | Loan approval status: 1 = approved; 0 = rejected | Integer |
The dataset can be used for multiple purposes:
loan_status
variable (approved/not approved) for potential applicants.credit_score
variable based on individual and loan-related attributes. Mind the data issue from the original data, such as the instance > 100-year-old as age.
This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
CanVec contains more than 60 topographic features classes organized into 8 themes: Transport Features, Administrative Features, Hydro Features, Land Features, Manmade Features, Elevation Features, Resource Management Features and Toponymic Features. This multiscale product originates from the best available geospatial data sources covering Canadian territory. It offers quality topographic information in vector format complying with international geomatics standards. CanVec can be used in Web Map Services (WMS) and geographic information systems (GIS) applications and used to produce thematic maps. Because of its many attributes, CanVec allows for extensive spatial analysis. Related Products: Constructions and Land Use in Canada - CanVec Series - Manmade Features Lakes, Rivers and Glaciers in Canada - CanVec Series - Hydrographic Features Administrative Boundaries in Canada - CanVec Series - Administrative Features Mines, Energy and Communication Networks in Canada - CanVec Series - Resources Management Features Wooded Areas, Saturated Soils and Landscape in Canada - CanVec Series - Land Features Transport Networks in Canada - CanVec Series - Transport Features Elevation in Canada - CanVec Series - Elevation Features Map Labels - CanVec Series - Toponymic Features
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data set includes polygons for ecological sections within Subregions within the conterminous United States. This data set contains regional geographic delineations for analysis of ecological relationships across ecological units. Metadata
This data set includes 16 zipped archives of shapefiles of cities, rivers and streams, roads, and study area boundaries of several Amazonian study sites: Altamira, Santarem, Bragantina, and Ponta de Pedras, in the state of Para, and 1 site at Machadinho D'Oeste, in the state of Rondonia. Data from Brazil were digitized from Instituto Nacional de Colonizacao e Reforma Agraria (INCRA) maps and other data from Instituto Brasileiro de Geografia e Estatistica (IBGE). These products were prepared in the 2000-2004 time period. The data of creation for the source material is unknown.
The Product Line Architecture (PLA) is one of the most important artifacts of a Software Product Line (SPL). PLA design can be formulated as an interactive optimization problem with many conflicting factors. Incorporate Decision Makers’ (DM) preferences during the search process may help the algorithms to find more adequate solutions for their profiles. Interactive approaches allow the DM to evaluate solutions, guiding the optimization according to their preferences. However, this brings up human fatigue problems caused by the excessive amount of interactions and solutions to evaluate. A common strategy to prevent this problem is limiting the number of interactions and solutions evaluated by the DM. Machine Learning (ML) models were also used to learn how to evaluate solutions according to the DM profile and replace them after some interactions. Feature selection performs an essential task as non-relevant and/or redundant features used to train the ML model can reduce the accuracy and comprehensibility of the hypotheses induced by ML algorithms. This work aims to select features of a ML model used to prevent human fatigue in an interactive search-based PLA design approach. We applied four selectors and through results we were able to reduce 30% of features, obtaining an accuracy of 99%.
The Geological Atlas of the Western Canada Sedimentary Basin was designed primarily as a reference volume documenting the subsurface geology of the Western Canada Sedimentary Basin. This GIS dataset is one of a collection of shapefiles representing part of Chapter 29 of the Atlas, In-situ Stress in the Western Canada Sedimentary Basin, Figure 10, Stress Trajectories Determined from Breakouts. Shapefiles were produced from archived digital files created by the Alberta Geological Survey in the mid-1990s, and edited in 2005-06 to correct, attribute and consolidate the data into single files by feature type and by figure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset ds180_Chinook_pnts is a product of the CalFish Adult Salmonid Abundance Database. Data in this shapefile are collected from point features, such as dams and hatcheries. Some escapement monitoring locations, such as spawning stock surveys, are logically represented by linear features. See the companion linear feature shapefile ds181_Chinook_ln for information collected from stream reaches.The CalFish Abundance Database contains a comprehensive collection of anadromous fisheries abundance information. Beginning in 1998, the Pacific States Marine Fisheries Commission, the California Department of Fish and Game, and the National Marine Fisheries Service, began a cooperative project aimed at collecting, archiving, and entering into standardized electronic formats, the wealth of information generated by fisheries resource management agencies and tribes throughout California.The data format provides for sufficient detail to convey the relative accuracy of each population trend index record yet is simple and straight forward enough to be suited for public use. For those interested in more detail the database offers hyperlinks to digital copies of the original documents used to compile the information. In this way the database serves as an information hub directing the user to additional supporting information. This offers utility to field biologists and others interested in obtaining information for more in-depth analysis. Hyperlinks, built into the spatial data attribute tables used in the BIOS and CalFish I-map viewers, open the detailed index data archived in the on-line CalFish database application. The information can also be queried directly from the database via the CalFish Tabular Data Query. Once the detailed annual trend data are in view, another hyperlink opens a digital copy of the document used to compile each record.During 2010, as a part of the Central Valley Chinook Comprehensive Monitoring Plan, the CalFish Salmonid Abundance Database was reorganized and updated. CalFish provides a central location for sharing Central Valley Chinook salmon escapement estimates and annual monitoring reports to all stakeholders, including the public. Annual Chinook salmon in-river escapement indices that were, in many cases, eight to ten years behind are now current though 2009. In some cases, multiple datasets were consolidated into a single, more comprehensive, dataset to more closely reflect how data are reported in the California Department of Fish and Game standard index, Grandtab.Extensive data are currently available in the CalFish Abundance Database for California Chinook, coho, and steelhead. Major data categories include adult abundance population estimates, actual fish and/or carcass counts, counts of fish collected at dams, weirs, or traps, and redd counts. Harvest data has also been compiled for many streams.This CalFish Abundance Database shapefile was generated from fully routed 1:100,000 hydrography. In a few cases streams had to be added to the hydrography dataset in order to provide a means to create shapefiles to represent abundance data associated with them. Streams added were digitized at no more than 1:24,000 scale based on stream line images portrayed in 1:24,000 Digital Raster Graphics (DRG).The features in this layer represent the location for which abundance data records apply. In many cases there are multiple datasets associated with the same location, and so, features may overlap. Please view the associated datasets for detail regarding specific features. In CalFish these are accessed through the "link" field that is visible when performing an identify or query operation. A URL string is provided with each feature in the downloadable data which can also be used to access the underlying datasets.The Chinook data that is available from the CalFish website is actually mirrored from the StreamNet website where the CalFish Abundance Databases tabular data is currently stored. Additional information about StreamNet may be downloaded at http://www.streamnet.org. Complete documentation for the StreamNet database may be accessed at http://http://www.streamnet.org/def.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such “rare features” has received little attention despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers. Supplementary materials for this article are available online.