100+ datasets found

f
Data from: Rare Feature Selection in High Dimensions
tandf.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaohan Yan; Jacob Bien (2023). Rare Feature Selection in High Dimensions [Dataset]. http://doi.org/10.6084/m9.figshare.12851331.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12851331.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Xiaohan Yan; Jacob Bien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such “rare features” has received little attention despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers. Supplementary materials for this article are available online.
d
Great Basin Montane Watersheds - Streams (Feature Layer)
catalog.data.gov
agdatacommons.nal.usda.gov
+5more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Forest Service (2025). Great Basin Montane Watersheds - Streams (Feature Layer) [Dataset]. https://catalog.data.gov/dataset/great-basin-montane-watersheds-streams-feature-layer
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
U.S. Forest Service
Area covered
Great Basin
Description
Multiple research and management partners collaboratively developed a multiscale approach for assessing the geomorphic sensitivity of streams and ecological resilience of riparian and meadow ecosystems in upland watersheds of the Great Basin to disturbances and management actions. The approach builds on long-term work by the partners on the responses of these systems to disturbances and management actions. At the core of the assessments is information on past and present watershed and stream channel characteristics, geomorphic and hydrologic processes, and riparian and meadow vegetation. In this report, we describe the approach used to delineate Great Basin mountain ranges and the watersheds within them, and the data that are available for the individual watersheds. We also describe the resulting database and the data sources. Furthermore, we summarize information on the characteristics of the regions and watersheds within the regions and the implications of the assessments for geomorphic sensitivity and ecological resilience. The target audience for this multiscale approach is managers and stakeholders interested in assessing and adaptively managing Great Basin stream systems and riparian and meadow ecosystems. Anyone interested in delineating the mountain ranges and watersheds within the Great Basin or quantifying the characteristics of the watersheds will be interested in this report. For more information, visit: https://www.fs.usda.gov/research/treesearch/61573Metadata and Downloads
d
Allegheny County Park Features
catalog.data.gov
data.wprdc.org
+2more
Updated May 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allegheny County (2023). Allegheny County Park Features [Dataset]. https://catalog.data.gov/dataset/allegheny-county-park-features
Explore at:
Dataset updated
May 14, 2023
Dataset provided by
Allegheny County
Area covered
Allegheny County
Description
A combination of all park features, events, recreations, facilities, all in one layer, including Activenet information.
m
Features Comparison Data
landing-fe-dev.mentionnetwork.xyz
mention.network
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mention Network (2024). Features Comparison Data [Dataset]. https://landing-fe-dev.mentionnetwork.xyz/
Explore at:
Dataset updated
2024
Dataset authored and provided by
Mention Network
License
http://schema.org/PublicDomainhttp://schema.org/PublicDomain
Description
Comparison matrix showing feature availability across different AI visibility platforms
Lakes, Rivers and Glaciers in Canada - CanVec Series - Hydrographic Features...
open.canada.ca
catalogue.arctic-sdi.org
+3more
fgdb/gdb, html, kmz +2
Updated May 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Resources Canada (2023). Lakes, Rivers and Glaciers in Canada - CanVec Series - Hydrographic Features [Dataset]. https://open.canada.ca/data/en/dataset/9d96e8c9-22fe-4ad2-b5e8-94a6991b744b
Explore at:
html, fgdb/gdb, kmz, wms, shpAvailable download formats
Dataset updated
May 19, 2023
Dataset provided by
Ministry of Natural Resources of Canadahttps://www.nrcan.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
The hydrographic features of the CanVec series include watercourses, water linear flow segments, hydrographic obstacles (falls, rapids, etc.), waterbodies (lakes, watercourses, etc.), permanent snow and ice features, water wells and springs. The Hydrographic features theme provides quality vector geospatial data (current, accurate, and consistent) of Canadian hydrographic phenomena. It aims to offer a geometric description and a set of basic attributes on hydrographic features that comply with international geomatics standards, seamlessly across Canada. The CanVec multiscale series is available as prepackaged downloadable files and by user-defined extent via a Geospatial data extraction tool. Related Products: Topographic Data of Canada - CanVec Series
Customer Segmentation Data
kaggle.com
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raval Smit (2024). Customer Segmentation Data [Dataset]. https://www.kaggle.com/datasets/ravalsmit/customer-segmentation-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raval Smit
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.

Key Features:

Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.

Usage Examples:

Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!
B
Research Data Repository Requirements and Features Review
borealisdata.ca
dataone.org
Updated Aug 24, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Leahey; Peter Webster; Claire Austin; Nancy Fong; Julie Friddell; Chuck Humphrey; Susan Brown; Walter Stewart (2015). Research Data Repository Requirements and Features Review [Dataset]. http://doi.org/10.5683/SP3/UPABVH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/UPABVH
Dataset updated
Aug 24, 2015
Dataset provided by
Borealis
Authors
Amber Leahey; Peter Webster; Claire Austin; Nancy Fong; Julie Friddell; Chuck Humphrey; Susan Brown; Walter Stewart
License
https://borealisdata.ca/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.5683/SP3/UPABVHhttps://borealisdata.ca/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.5683/SP3/UPABVH
Time period covered
Sep 2014 - Feb 2015
Area covered
Canada, United States, Europe, United Kingdom, International
Description
Data collected from major Canadian and international research data repositories cover data storage, preservation, metadata, interchange, data file types, and other standard features used in the retention and sharing of research data. The outputs of this project primarily aim to assist in the establishment of recommended minimum requirements for a Canadian research data infrastructure. The committee also aims to further develop guidelines and criteria for the assessment and selection o f repositories for deposit of Canadian research data by researchers, data managers, librarians, archivists etc.
d
GIS Features of the Geospatial Fabric for National Hydrologic Modeling
catalog.data.gov
data.usgs.gov
+5more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). GIS Features of the Geospatial Fabric for National Hydrologic Modeling [Dataset]. https://catalog.data.gov/dataset/gis-features-of-the-geospatial-fabric-for-national-hydrologic-modeling
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The Geopspatial Fabric provides a consistent, documented, and topologically connected set of spatial features that create an abstracted stream/basin network of features useful for hydrologic modeling.The GIS vector features contained in this Geospatial Fabric (GF) data set cover the lower 48 U.S. states, Hawaii, and Puerto Rico. Four GIS feature classes are provided for each Region: 1) the Region outline ("one"), 2) Points of Interest ("POIs"), 3) a routing network ("nsegment"), and 4) Hydrologic Response Units ("nhru"). A graphic showing the boundaries for all Regions is provided at http://dx.doi.org/doi:10.5066/F7542KMD. These Regions are identical to those used to organize the NHDPlus v.1 dataset (US EPA and US Geological Survey, 2005). Although the GF Feature data set has been derived from NHDPlus v.1, it is an entirely new data set that has been designed to generically support regional and national scale applications of hydrologic models. Definition of each type of feature class and its derivation is provided within the
b
Effect of of gamelike features on cognitive test performance - Datasets -...
data.bris.ac.uk
Updated Apr 4, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Effect of of gamelike features on cognitive test performance - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/1hjvqlpbtrk961ua9ml40bauie
Explore at:
Dataset updated
Apr 4, 2016
Description
This study compared three versions of Go/No-Go (GNG) task, each with different gamelike features (non-game, points, theme) across two different testing sites (laboratory and online). We used a between subjects design, with reaction times (RT) on Go trials, Go trial accuracy, No-Go trial accuracy and subjective ratings as the dependent variables of interest.
24k Hydro Full File Geodatabase
data-wi-dnr.opendata.arcgis.com
arc-gis-hub-home-arcgishub.hub.arcgis.com
Updated Aug 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wisconsin Department of Natural Resources (2017). 24k Hydro Full File Geodatabase [Dataset]. https://data-wi-dnr.opendata.arcgis.com/datasets/cb1c7f75d14f42ee819a46894fd2e771
Explore at:
Dataset updated
Aug 1, 2017
Dataset authored and provided by
Wisconsin Department of Natural Resourceshttp://dnr.wi.gov/
Area covered

Description
24K Hydro File Geodatabase, including bank lines, flow lines, junction points, hydro lines, water bodies, hydro points, and a network. Access the user guide, data dictionaries, and metadata below.The DNR Hydrography database was developed statewide using several 1:24,000-scale sources. This data layer includes information about surface water features represented on the USGS 1:24,000-scale topographic map series such as perennial and intermittent streams, lakes, etc. Because the sources of the Hydrography data span many years and originate from several sources, the data may reflect areas of transition from one source to another. As a result, the water features as represented in the Hydrography data may not always match what you see on a particular USGS quad or Digital Raster Graphic (DRG). General source information is presented on this map: Wisconsin Hydrography Source Information. Note: Wetlands delineations are not included in the DNR Hydrography data layer. For information about DNR Wetlands data, see the Wisconsin Wetland Inventory web page.Report errors in this data to Dennis Wiese (dennis.wiese@wisconsin.gov) with the following information:HYDROID of the feature in question; OR if the feature is missing, a location coordinate or description (e.g. latitude/longitude, Public Land Survey System Township, Range, and Section identifier) that identifies the area in question.Optional but very helpful: a screen capture of the area in question, or the Water Body Identification Code (WBIC) of the feature in question.DNR staff can access the hydrography database in the agency's central GIS data repository. The hydrography feature classes are stored in the feature dataset "W23324.WD_HYDRO_DATA_24K".USER GUIDES AND DOCUMENTATION: WDNR_HYDRO_24k_GETTING STARTED WDNR HYDRO 24K UPDATES DOCUMENT 24K HYDRO DECISION RULESData Dictionaries and Metadata WDNR_HYDRO_24k_waterbody_data_dict WDNR_HYDRO_24k_waterbody_metadata WDNR_HYDRO_24k_flowline_data_dict WDNR_HYDRO_24k_flowline_metadata WDNR_HYDRO_24k_bank_data_dict WDNR_HYDRO_24k_bank_metadata WDNR_HYDRO_24k_junction_data_dict WDNR_HYDRO_24k_junction_metadata WDNR_HYDRO_24k_line_data_dict WDNR_HYDRO_24k_line_metadata WDNR_HYDRO_24k_flowline_wbic_data_dict WDNR_HYDRO_24k_flowline_wbic_metadata WDNR_HYDRO_24k_waterbody_wbic_data_dict WDNR_HYDRO_24k_waterbody_wbic_metadataArcMap Layer (.lyr) Files 24k Hydro Flowline Duration 24k Hydro Bank Lines 24k Hydro Flowline Streams 24k Hydro Waterbody Open Water
N
Zoning GIS Data: Geodatabase
data.cityofnewyork.us
data.ny.gov
+1more
application/rdfxml +5
Updated Jan 29, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of City Planning (DCP) (2013). Zoning GIS Data: Geodatabase [Dataset]. https://data.cityofnewyork.us/City-Government/Zoning-GIS-Data-Geodatabase/mm69-vrje
Explore at:
csv, application/rssxml, xml, application/rdfxml, json, tsvAvailable download formats
Dataset updated
Jan 29, 2013
Dataset authored and provided by
Department of City Planning (DCP)
Description
This data set consists of 6 classes of zoning features: zoning districts, special purpose districts, special purpose district subdistricts, limited height districts, commercial overlay districts, and zoning map amendments.

All previously released versions of this data are available at BYTES of the BIG APPLE - Archive.
Landmark Features - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Mar 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2023). Landmark Features - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/landmark-features1
Explore at:
Dataset updated
Mar 29, 2023
Dataset provided by
CKANhttps://ckan.org/
Description
The location of Landmark Features within Nottingham City Centre. Landmark Features are points of local interest and significance within the townscape.
T
Park Features By PMAID
cos-data.seattle.gov
data.seattle.gov
+2more
application/rdfxml +5
Updated Oct 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle (2024). Park Features By PMAID [Dataset]. https://cos-data.seattle.gov/Community-and-Culture/Park-Features-By-PMAID/xrnu-8eiq
Explore at:
csv, json, xml, application/rssxml, tsv, application/rdfxmlAvailable download formats
Dataset updated
Oct 4, 2024
Dataset authored and provided by
City of Seattle
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This dataset contains a list of features for each park PMAID.

Loan Approval Classification Dataset

kaggle.com

Updated Oct 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Ta-wei Lo (2024). Loan Approval Classification Dataset [Dataset]. https://www.kaggle.com/datasets/taweilo/loan-approval-classification-data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 29, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ta-wei Lo

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

1. Data Source

This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.

2. Metadata

The dataset contains 45,000 records and 14 variables, each described below:

Column	Description	Type
`person_age`	Age of the person	Float
`person_gender`	Gender of the person	Categorical
`person_education`	Highest education level	Categorical
`person_income`	Annual income	Float
`person_emp_exp`	Years of employment experience	Integer
`person_home_ownership`	Home ownership status (e.g., rent, own, mortgage)	Categorical
`loan_amnt`	Loan amount requested	Float
`loan_intent`	Purpose of the loan	Categorical
`loan_int_rate`	Loan interest rate	Float
`loan_percent_income`	Loan amount as a percentage of annual income	Float
`cb_person_cred_hist_length`	Length of credit history in years	Float
`credit_score`	Credit score of the person	Integer
`previous_loan_defaults_on_file`	Indicator of previous loan defaults	Categorical
`loan_status` (target variable)	Loan approval status: 1 = approved; 0 = rejected	Integer

3. Data Usage

The dataset can be used for multiple purposes:

Exploratory Data Analysis (EDA): Analyze key features, distribution patterns, and relationships to understand credit risk factors.
Classification: Build predictive models to classify the loan_status variable (approved/not approved) for potential applicants.
Regression: Develop regression models to predict the credit_score variable based on individual and loan-related attributes.

Mind the data issue from the original data, such as the instance > 100-year-old as age.

This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

Topographic Data of Canada - CanVec Series
open.canada.ca
catalogue.arctic-sdi.org
+3more
fgdb/gdb, html, kmz +3
Updated May 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Resources Canada (2023). Topographic Data of Canada - CanVec Series [Dataset]. https://open.canada.ca/data/en/dataset/8ba2aa2a-7bb9-4448-b4d7-f164409fe056
Explore at:
html, fgdb/gdb, wms, shp, kmz, pdfAvailable download formats
Dataset updated
May 19, 2023
Dataset provided by
Ministry of Natural Resources of Canadahttps://www.nrcan.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
CanVec contains more than 60 topographic features classes organized into 8 themes: Transport Features, Administrative Features, Hydro Features, Land Features, Manmade Features, Elevation Features, Resource Management Features and Toponymic Features. This multiscale product originates from the best available geospatial data sources covering Canadian territory. It offers quality topographic information in vector format complying with international geomatics standards. CanVec can be used in Web Map Services (WMS) and geographic information systems (GIS) applications and used to produce thematic maps. Because of its many attributes, CanVec allows for extensive spatial analysis. Related Products: Constructions and Land Use in Canada - CanVec Series - Manmade Features Lakes, Rivers and Glaciers in Canada - CanVec Series - Hydrographic Features Administrative Boundaries in Canada - CanVec Series - Administrative Features Mines, Energy and Communication Networks in Canada - CanVec Series - Resources Management Features Wooded Areas, Saturated Soils and Landscape in Canada - CanVec Series - Land Features Transport Networks in Canada - CanVec Series - Transport Features Elevation in Canada - CanVec Series - Elevation Features Map Labels - CanVec Series - Toponymic Features
a
Ecological Sections (Feature Layer)
sal-urichmond.hub.arcgis.com
datasets.ai
+6more
Updated Jan 1, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Forest Service (2007). Ecological Sections (Feature Layer) [Dataset]. https://sal-urichmond.hub.arcgis.com/datasets/usfs::ecological-sections-feature-layer
Explore at:
Dataset updated
Jan 1, 2007
Dataset authored and provided by
U.S. Forest Service
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered

Description
This data set includes polygons for ecological sections within Subregions within the conterminous United States. This data set contains regional geographic delineations for analysis of ecological relationships across ecological units. Metadata
Data from: LBA-ECO LC-09 Natural, Infrastructure, and Boundary Features,...
data.nasa.gov
datasets.ai
+6more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). LBA-ECO LC-09 Natural, Infrastructure, and Boundary Features, Amazonian Sites, Brazil [Dataset]. https://data.nasa.gov/dataset/lba-eco-lc-09-natural-infrastructure-and-boundary-features-amazonian-sites-brazil-804ed
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
Brazil
Description
This data set includes 16 zipped archives of shapefiles of cities, rivers and streams, roads, and study area boundaries of several Amazonian study sites: Altamira, Santarem, Bragantina, and Ponta de Pedras, in the state of Para, and 1 site at Machadinho D'Oeste, in the state of Rondonia. Data from Brazil were digitized from Instituto Nacional de Colonizacao e Reforma Agraria (INCRA) maps and other data from Instituto Brasileiro de Geografia e Estatistica (IBGE). These products were prepared in the 2000-2004 time period. The data of creation for the source material is unknown.
Data from: Feature selection in an interactive search-based PLA design...
zenodo.org
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2023). Feature selection in an interactive search-based PLA design approach [Dataset]. http://doi.org/10.5281/zenodo.7942374
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7942374
Dataset updated
May 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
Description
The Product Line Architecture (PLA) is one of the most important artifacts of a Software Product Line (SPL). PLA design can be formulated as an interactive optimization problem with many conflicting factors. Incorporate Decision Makers’ (DM) preferences during the search process may help the algorithms to find more adequate solutions for their profiles. Interactive approaches allow the DM to evaluate solutions, guiding the optimization according to their preferences. However, this brings up human fatigue problems caused by the excessive amount of interactions and solutions to evaluate. A common strategy to prevent this problem is limiting the number of interactions and solutions evaluated by the DM. Machine Learning (ML) models were also used to learn how to evaluate solutions according to the DM profile and replace them after some interactions. Feature selection performs an essential task as non-relevant and/or redundant features used to train the ML model can reduce the accuracy and comprehensibility of the hypotheses induced by ML algorithms. This work aims to select features of a ML model used to prevent human fatigue in an interactive search-based PLA design approach. We applied four selectors and through results we were able to reduce 30% of features, obtaining an accuracy of 99%.
u
Stress Trajectories Determined from Breakouts (GIS data, line features) -...
data.urbandatacentre.ca
beta.data.urbandatacentre.ca
Updated Jun 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Stress Trajectories Determined from Breakouts (GIS data, line features) - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/ab-gda-dig_2008_0316
Explore at:
Dataset updated
Jun 24, 2025
Description
The Geological Atlas of the Western Canada Sedimentary Basin was designed primarily as a reference volume documenting the subsurface geology of the Western Canada Sedimentary Basin. This GIS dataset is one of a collection of shapefiles representing part of Chapter 29 of the Atlas, In-situ Stress in the Western Canada Sedimentary Basin, Figure 10, Stress Trajectories Determined from Breakouts. Shapefiles were produced from archived digital files created by the Alberta Geological Survey in the mid-1990s, and edited in 2005-06 to correct, attribute and consolidate the data into single files by feature type and by figure.
Chinook Abundance - Point Features [ds180]
gis-california.opendata.arcgis.com
data.cnra.ca.gov
+8more
Updated Jan 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2020). Chinook Abundance - Point Features [ds180] [Dataset]. https://gis-california.opendata.arcgis.com/datasets/CDFW::chinook-abundance-point-features-ds180
Explore at:
Dataset updated
Jan 31, 2020
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The dataset ds180_Chinook_pnts is a product of the CalFish Adult Salmonid Abundance Database. Data in this shapefile are collected from point features, such as dams and hatcheries. Some escapement monitoring locations, such as spawning stock surveys, are logically represented by linear features. See the companion linear feature shapefile ds181_Chinook_ln for information collected from stream reaches.The CalFish Abundance Database contains a comprehensive collection of anadromous fisheries abundance information. Beginning in 1998, the Pacific States Marine Fisheries Commission, the California Department of Fish and Game, and the National Marine Fisheries Service, began a cooperative project aimed at collecting, archiving, and entering into standardized electronic formats, the wealth of information generated by fisheries resource management agencies and tribes throughout California.The data format provides for sufficient detail to convey the relative accuracy of each population trend index record yet is simple and straight forward enough to be suited for public use. For those interested in more detail the database offers hyperlinks to digital copies of the original documents used to compile the information. In this way the database serves as an information hub directing the user to additional supporting information. This offers utility to field biologists and others interested in obtaining information for more in-depth analysis. Hyperlinks, built into the spatial data attribute tables used in the BIOS and CalFish I-map viewers, open the detailed index data archived in the on-line CalFish database application. The information can also be queried directly from the database via the CalFish Tabular Data Query. Once the detailed annual trend data are in view, another hyperlink opens a digital copy of the document used to compile each record.During 2010, as a part of the Central Valley Chinook Comprehensive Monitoring Plan, the CalFish Salmonid Abundance Database was reorganized and updated. CalFish provides a central location for sharing Central Valley Chinook salmon escapement estimates and annual monitoring reports to all stakeholders, including the public. Annual Chinook salmon in-river escapement indices that were, in many cases, eight to ten years behind are now current though 2009. In some cases, multiple datasets were consolidated into a single, more comprehensive, dataset to more closely reflect how data are reported in the California Department of Fish and Game standard index, Grandtab.Extensive data are currently available in the CalFish Abundance Database for California Chinook, coho, and steelhead. Major data categories include adult abundance population estimates, actual fish and/or carcass counts, counts of fish collected at dams, weirs, or traps, and redd counts. Harvest data has also been compiled for many streams.This CalFish Abundance Database shapefile was generated from fully routed 1:100,000 hydrography. In a few cases streams had to be added to the hydrography dataset in order to provide a means to create shapefiles to represent abundance data associated with them. Streams added were digitized at no more than 1:24,000 scale based on stream line images portrayed in 1:24,000 Digital Raster Graphics (DRG).The features in this layer represent the location for which abundance data records apply. In many cases there are multiple datasets associated with the same location, and so, features may overlap. Please view the associated datasets for detail regarding specific features. In CalFish these are accessed through the "link" field that is visible when performing an identify or query operation. A URL string is provided with each feature in the downloadable data which can also be used to access the underlying datasets.The Chinook data that is available from the CalFish website is actually mirrored from the StreamNet website where the CalFish Abundance Databases tabular data is currently stored. Additional information about StreamNet may be downloaded at http://www.streamnet.org. Complete documentation for the StreamNet database may be accessed at http://http://www.streamnet.org/def.html

Facebook

Twitter

Click to copy link

Link copied

Cite

Xiaohan Yan; Jacob Bien (2023). Rare Feature Selection in High Dimensions [Dataset]. http://doi.org/10.6084/m9.figshare.12851331.v2

Data from: Rare Feature Selection in High Dimensions

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.12851331.v2

Dataset updated

May 30, 2023

Dataset provided by

Taylor & Francis

Authors

Xiaohan Yan; Jacob Bien

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such “rare features” has received little attention despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers. Supplementary materials for this article are available online.

Clear search

Close search

Google apps

Main menu

Data from: Rare Feature Selection in High Dimensions

Great Basin Montane Watersheds - Streams (Feature Layer)

Allegheny County Park Features

Features Comparison Data

Lakes, Rivers and Glaciers in Canada - CanVec Series - Hydrographic Features...

Customer Segmentation Data

Key Features:

Usage Examples:

Research Data Repository Requirements and Features Review

GIS Features of the Geospatial Fabric for National Hydrologic Modeling

Effect of of gamelike features on cognitive test performance - Datasets -...

24k Hydro Full File Geodatabase

Zoning GIS Data: Geodatabase

Landmark Features - Dataset - data.gov.uk

Park Features By PMAID

Loan Approval Classification Dataset

1. Data Source

2. Metadata

3. Data Usage

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

Topographic Data of Canada - CanVec Series

Ecological Sections (Feature Layer)

Data from: LBA-ECO LC-09 Natural, Infrastructure, and Boundary Features,...

Data from: Feature selection in an interactive search-based PLA design...

Stress Trajectories Determined from Breakouts (GIS data, line features) -...

Chinook Abundance - Point Features [ds180]

Data from: Rare Feature Selection in High Dimensions