Facebook
TwitterClassification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.
Facebook
TwitterOur Geospatial Dataset connects people's movements to over 200M physical locations globally. These are aggregated and anonymized data that are only used to offer context for the volume and patterns of visits to certain locations. This data feed is compiled from different data sources around the world.
It includes information such as the name, address, coordinates, and category of these locations, which can range from restaurants and hotels to parks and tourist attractions
Location Intelligence Data Reach: Location Intelligence data brings the POI/Place/OOH level insights calculated on the basis of Factori’s Mobility & People Graph data aggregated from multiple data sources globally. In order to achieve the desired foot-traffic attribution, specific attributes are combined to bring forward the desired reach data. For instance, in order to calculate the foot traffic for a specific location, a combination of location ID, day of the week, and part of the day can be combined to give specific location intelligence data. There can be a maximum of 56 data records possible for one POI based on the combination of these attributes.
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).
Use Cases: Credit Scoring: Financial services can use alternative data to score an underbanked or unbanked customer by validating locations and persona. Retail Analytics: Analyze footfall trends in various locations and gain an understanding of customer personas. Market Intelligence: Study various market areas, the proximity of points or interests, and the competitive landscape Urban Planning: Build cases for urban development, public infrastructure needs, and transit planning based on fresh population data. Marketing Campaign Strategy: Analyzing visitor demographics and behavior patterns around POIs, businesses can tailor their marketing strategies to effectively reach their target audience. OOH/DOOH Campaign Planning: Identify high-traffic locations and understand consumer behavior in specific areas, to execute targeted advertising strategies effectively. Geofencing: Geofencing involves creating virtual boundaries around physical locations, enabling businesses to trigger actions when users enter or exit these areas
Data Attributes Included:
LocationID
name
website
BrandID
Phone
streetAddress
city
state
country_code
zip
lat
lng
poi_status
geoHash8
poi_id
category
category_id
full_address
address
additional_categories
url
domain
rating
price_level
rating_distribution
is_claimed
photo_url
attributes
brand_name
brand_id
status
total_photos
popular_times
places_topics
people_also_search
work_hours
local_business_links
contact_info
reviews_count
naics_code
naics_code_description
sis_code
sic_code_description
shape_polygon
building_id
building_type
building_name
geometry_location_type
geometry_viewport_northeast_lat
geometry_viewport_northeast_lng
geometry_viewport_southwest_lat
geometry_viewport_southwest_lng
geometry_location_lat
geometry_location_lng
calculated_geo_hash_8
Facebook
TwitterJurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The
Facebook
TwitterOur Location Intelligence Data connects people's movements to over 14M physical locations globally. These are aggregated and anonymized data that are only used to offer context for the volume and patterns of visits to certain locations. This data feed is compiled from different data sources around the world.
Location Intelligence Data Reach: Location Intelligence data brings the POI/Place/OOH level insights calculated based on Factori’s Mobility & People Graph data aggregated from multiple data sources globally. To achieve the desired foot-traffic attribution, specific attributes are combined to bring forward the desired reach data. For instance, to calculate the foot traffic for a specific location, a combination of location ID, day of the week, and part of the day can be combined to give specific location intelligence data. There can be a maximum of 40 data records possible for one POI based on the combination of these attributes.
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).
Use Case:
Consumer Insights Gain a complete 360-degree view of the customer to detect behavioral changes, assess patterns, and forecast business effects.
Data Enrichment Leverage O2O consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment.
Sales Forecasting Analyze consumer behavior to predict sales and monitor performance of investments
Retail Analytics Analyze footfall trends in various locations and gain understanding of customer personas.
Geofencing: Geofencing involves creating virtual boundaries around physical locations, enabling businesses to trigger actions when users enter or exit these areas
Geo-Targeted Advertising: Utilizing location-based insights, businesses can deliver highly personalized advertisements to consumers based on their proximity to relevant POIs.
Marketing Campaign Strategy: Analyzing visitor demographics and behavior patterns around POIs, businesses can tailor their marketing strategies to effectively reach their target audience.
Site Selection: By assessing the proximity to relevant POIs such as competitors, customer demographics, and economic indicators, organizations can make informed decisions about opening new locations.
OOH/DOOH Campaign Planning: Identify high-traffic locations and understand consumer behavior in specific areas, to execute targeted advertising strategies effectively.
Data Attributes Included: Anonymous id poi_id name description category category_id full_address address city state zip country_code phone url domain rating price_level rating_distribution is_claimed photo_url attributes brand_name brand_id status total_photos popular_times places_topics people_also_search work_hours local_business_links contact_info reviews count naics_code naics_code_description sic_code sic_code_description shape_type shape_polugon geometry_location_type geometry_viewport_northeast_lat geometry_viewport_northeast_lng geometry_viewport_southwest_lat geometry_viewport_southwest_lng geometry_location_lat geometry_location_lng calculated_geo_hash_8 building_id building_name building_type id_type gender age carrier make model os os_version home_country home_geohash work_geohash affluence brands_visited places_categories geo_behaviour interests device_age device_price travelled_countries
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Glycan arrays are indispensable for learning about the specificities of glycan-binding proteins. Despite the abundance of available data, the current analysis methods do not have the ability to interpret and use the variety of data types and to integrate information across datasets. Here, we evaluated whether a novel, automated algorithm for glycan-array analysis could meet that need. We developed a regression-tree algorithm with simultaneous motif optimization and packaged it in software called MotifFinder. We applied the software to analyze data from eight different glycan-array platforms with widely divergent characteristics and observed an accurate analysis of each dataset. We then evaluated the feasibility and value of the combined analyses of multiple datasets. In an integrated analysis of datasets covering multiple lectin concentrations, the software determined approximate binding constants for distinct motifs and identified major differences between the motifs that were not apparent from single-concentration analyses. Furthermore, an integrated analysis of data sources with complementary sets of glycans produced broader views of lectin specificity than produced by the analysis of just one data source. MotifFinder, therefore, enables the optimal use of the expanding resource of the glycan-array data and promises to advance the studies of protein–glycan interactions.
Facebook
TwitterBy Homeland Infrastructure Foundation [source]
Within this dataset, users can find numerous attributes that provide insight into various aspects of shoreline construction lines. The Category_o field categorizes these structures based on certain characteristics or purposes they serve. Additionally, each object in the dataset possesses a unique name or identifier represented by the Object_Nam column.
Another crucial piece of information captured in this dataset is the status of each shoreline construction line. The Status field indicates whether a particular structure is currently active or inactive. This helps users understand if it still serves its intended purpose or has been decommissioned.
Furthermore, the dataset includes data pertaining to multiple water levels associated with different shoreline construction lines. This information can be found in the Water_Leve column and provides relevant context for understanding how these artificial coastlines interact with various water bodies.
To aid cartographic representations and proper utilization of this data source for mapping purposes at different scales, there is also an attribute called Scale_Mini. This value denotes the minimum scale necessary to visualize a specific shoreline construction line accurately.
Data sources are important for reproducibility and quality assurance purposes in any GIS analysis project; hence identifying who provided and contributed to collecting this data can be critical in assessing its reliability. In this regard, individuals or organizations responsible for providing source data are specified in the column labeled Source_Ind.
Accompanying descriptive information about each source used to create these shoreline constructions lines can be found in the Source_D_1 field. This supplemental information provides additional context and details about the data's origin or collection methodology.
The dataset also includes a numerical attribute called SHAPE_Leng, representing the length of each shoreline construction line. This information complements the geographic and spatial attributes associated with these structures.
Understanding the Categories:
- The Category_o column classifies each shoreline construction line into different categories. This can range from seawalls and breakwaters to jetties and groins.
- Use this information to identify specific types of shoreline constructions based on your analysis needs.
Identifying Specific Objects:
- The Object_Nam column provides unique names or identifiers for each shoreline construction line.
- These identifiers help differentiate between different segments of construction lines in a region.
Determining Status:
- The Status column indicates whether a shoreline construction line is active or inactive.
- Active constructions are still in use and may be actively maintained or monitored.
- Inactive constructions are no longer operational or may have been demolished.
Analyzing Water Levels:
- The Water_Leve column describes the water level at which each shoreline construction line is located.
- Different levels may impact the suitability or effectiveness of these structures based on tidal changes or flood zones.
Exploring Additional Information:
- The Informatio column contains additional details about each shoreline construction line.
- This can include various attributes such as materials used, design specifications, ownership details, etc.
Determining Minimum Visible Scale:
-- The Scale_Mini column specifies the minimum scale at which you can observe the coastline's man-made structures clearly.Verifying Data Sources: -- In order to understand data reliability and credibility for further analysis,Source_Ind, Source_D_1, SHAPE_Leng,and Source_Dat columns provide information about the individual or organization that provided the source data and length, and date of the source data used to create the shoreline construction lines.
Utilize this dataset to perform various analyses related to shorelines, coastal developments, navigational channels, and impacts of man-made structures on marine ecosystems. The combination of categories, object names, status, water levels, additional information, minimum visible scale and reliable source information offers a comprehensive understanding of shoreline constructions across different regions.
Remember to refer back to the dataset documentation for any specific deta...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The museum dataset is an evolving list of museums and related organizations in the United States. The data file includes basic information about each organization (name, address, phone, website, and revenue) plus the museum type or discipline. The discipline type is based on the National Taxonomy of Exempt Entities, which the National Center for Charitable Statistics and IRS use to classify nonprofit organizations.
Non-museum organizations may be included. For example, a non-museum organization may be included in the data file because it has a museum-like name on its IRS record for tax-exempt organizations. Museum foundations may also be included.
Museums may be missing. For example, local municipal museums may be undercounted because original data sources used to create the compilation did not include them.
Museums may be listed multiple times. For example, one museum may be listed as both itself and its parent organization because it was listed differently in each original data sources. Duplicate records are especially common for museums located within universities.
Information about museums may be outdated. The original scan and compilation of data sources occurred in 2014. Scans are no longer being done to update the data sources or add new data sources to the compilation. Information about museums may have changed since it was originally included in the file.
The museum data was compiled from IMLS administrative records for discretionary grant recipients, IRS records for tax-exempt organizations, and private foundation grant recipients.
Which city or state has the most museums per capita? How many zoos or aquariums exist in the United States? What museum or related organization had the highest revenue last year? How does the composition of museum types differ across the country?
Facebook
TwitterOverviewThe Jurisdictional Units dataset outlines wildland fire jurisdictional boundaries for federal, state, and local government entities on a national scale and is used within multiple wildland fire systems including the Wildland Fire Decision Support System (WFDSS), the Interior Fuels and Post-Fire Reporting System (IFPRS), the Interagency Fuels Treatment Decision Support System (IFTDSS), the Interagency Fire Occurrence Reporting Modules (InFORM), the Interagency Reporting of Wildland Fire Information System (IRWIN), and the Wildland Computer-Aided Dispatch Enterprise System (WildCAD-E).In this dataset, agency and unit names are an indication of the primary manager’s name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIID=null, JurisdictionalKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).AttributesField NameDefinitionGeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. Not populated for Census Block Groups.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available in the Unit ID standard.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons except for Census Blocks Group and for PAD-US polygons that did not have an associated name.LocalNameLocal name for the polygon provided from agency authoritative data, PAD-US, or other source.JurisdictionalKindDescribes the type of unit jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, Other, and Private. A value is not populated for Census Block Groups.JurisdictionalCategoryDescribes the type of unit jurisdiction using the NWCG Landowner Category data standard. Valid values include: BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, State, OtherLoc (other local, not in the standard), Private, and ANCSA. A value is not populated for Census Block Groups.LandownerKindThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. Legal values align with the NWCG Landowner Kind data standard. A value is populated for all polygons.LandownerCategoryThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. Legal values align with the NWCG Landowner Category data standard. A value is populated for all polygons.LandownerDepartmentFederal department information that aligns with a unit’s landownerCategory information. Legal values include: Department of Agriculture, Department of Interior, Department of Defense, and Department of Energy. A value is not populated for all polygons.DataSourceThe database from which the polygon originated. An effort is made to be as specific as possible (i.e. identify the geodatabase name and feature class in which the polygon originated).SecondaryDataSourceIf the DataSource field is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if DataSource is "PAD-US 4.0", then for a TNC polygon, the SecondaryDataSource would be " TNC_PADUS2_0_SA2015_Public_gdb ".SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.DataSourceYearYear that the source data for the polygon were acquired.MapMethodControlled vocabulary to define how the geospatial feature was derived. MapMethod will be Mixed Methods by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; Other.DateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using the 24-hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature.JoinMethodAdditional information on how the polygon was matched to information in the NWCG Unit ID database.LegendJurisdictionalCategoryJurisdictionalCategory values grouped for more intuitive use in a map legend or summary table. Census Block Groups are classified as “No Unit”.LegendLandownerCategoryLandownerCategory values grouped for more intuitive use in a map legend or summary table.Other Relevant NWCG Definition StandardsUnitA generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc.) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Protecting Unit; LandownerData SourcesThis dataset is an aggregation of multiple spatial data sources: • Authoritative land ownership records from BIA, BLM, NPS, USFS, USFWS, and the Alaska Fire Service/State of Alaska• The Protected Areas Database US (PAD-US 4.0)• Census Block-Group Geometry BIA and Tribal Data:BIA and Tribal land management data were aggregated from BIA regional offices. These data date from 2012 and were reviewed/updated in 2024. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The spatial data coverage is a consolidation of the best available records/data received from each of the 12 BIA Regional Offices. The data are no better than the original sources from which they were derived. Care was taken when consolidating these files. However, BWFM cannot accept any responsibility for errors, omissions, or positional accuracy in the original digital data. The information contained in these data is dynamic and is continually changing. Updates to these data will be made whenever such data are received from a Regional Office. The BWFM gives no guarantee, expressed, written, or implied, regarding the accuracy, reliability, or completeness of these data.Alaska:The state of Alaska and Alaska Fire Service (BLM) co-manage a process to aggregate authoritative land ownership, management, and jurisdictional boundary data, based on Master Title Plats. Data ProcessingTo compile this dataset, the authoritative land ownership records and the PAD-US data mentioned above were crosswalked into the Jurisdictional Unit Polygon schema and aggregated through a series of python scripts and FME models. Once aggregated, steps were taken to reduce overlaps within the data. All overlap areas larger than 300 acres were manually examined and removed with the assistance of fire management SMEs. Once overlaps were removed, Census Block Group geometry were crosswalked to the Jurisdictional Unit Polygon schema and appended in areas in which no jurisdictional boundaries were recorded within the authoritative land ownership records and the PAD-US data. Census Block Group geometries represent areas of unknown Landowner Kind/Category and Jurisdictional Kind/Category and were assigned LandownerKind and LandownerCategory values of "Private".Update FrequencyThe Authoritative land ownership records and PAD-US data used to compile this dataset are dynamic and are continually changing. Major updates to this dataset will be made once a year, and minor updates will be incorporated throughout the year as needed. New to the Latest Release (1/15/25)Now pulling from agency authoritative sources for BLM, NPS, USFS, and USFWS (instead of getting this data from PADUS).
Field Name Changes
Facebook
TwitterDescriptionThis is a vector tile layer built from the same data as the Jurisdictional Units Public feature service located here: https://nifc.maps.arcgis.com/home/item.html?id=4107b5d1debf4305ba00e929b7e5971a. This service can be used alone as a fast-drawing background layer, or used in combination with the feature service when Identify and Copy Feature capabilities are needed. At fine zoom levels, the feature service will be needed.OverviewThe Jurisdictional Units dataset outlines wildland fire jurisdictional boundaries for federal, state, and local government entities on a national scale and is used within multiple wildland fire systems including the Wildland Fire Decision Support System (WFDSS), the Interior Fuels and Post-Fire Reporting System (IFPRS), the Interagency Fuels Treatment Decision Support System (IFTDSS), the Interagency Fire Occurrence Reporting Modules (InFORM), the Interagency Reporting of Wildland Fire Information System (IRWIN), and the Wildland Computer-Aided Dispatch Enterprise System (WildCAD-E).In this dataset, agency and unit names are an indication of the primary manager’s name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIID=null, JurisdictionalKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).AttributesField NameDefinitionGeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. Not populated for Census Block Groups.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available in the Unit ID standard.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons except for Census Blocks Group and for PAD-US polygons that did not have an associated name.LocalNameLocal name for the polygon provided from agency authoritative data, PAD-US, or other source.JurisdictionalKindDescribes the type of unit jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, Other, and Private. A value is not populated for Census Block Groups.JurisdictionalCategoryDescribes the type of unit jurisdiction using the NWCG Landowner Category data standard. Valid values include: BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, State, OtherLoc (other local, not in the standard), Private, and ANCSA. A value is not populated for Census Block Groups.LandownerKindThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. Legal values align with the NWCG Landowner Kind data standard. A value is populated for all polygons.LandownerCategoryThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. Legal values align with the NWCG Landowner Category data standard. A value is populated for all polygons.LandownerDepartmentFederal department information that aligns with a unit’s landownerCategory information. Legal values include: Department of Agriculture, Department of Interior, Department of Defense, and Department of Energy. A value is not populated for all polygons.DataSourceThe database from which the polygon originated. An effort is made to be as specific as possible (i.e. identify the geodatabase name and feature class in which the polygon originated).SecondaryDataSourceIf the DataSource field is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if DataSource is "PAD-US 4.0", then for a TNC polygon, the SecondaryDataSource would be " TNC_PADUS2_0_SA2015_Public_gdb ".SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.DataSourceYearYear that the source data for the polygon were acquired.MapMethodControlled vocabulary to define how the geospatial feature was derived. MapMethod will be Mixed Methods by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; Other.DateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using the 24-hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature.JoinMethodAdditional information on how the polygon was matched to information in the NWCG Unit ID database.LegendJurisdictionalCategoryJurisdictionalCategory values grouped for more intuitive use in a map legend or summary table. Census Block Groups are classified as “No Unit”.LegendLandownerCategoryLandownerCategory values grouped for more intuitive use in a map legend or summary table.Other Relevant NWCG Definition StandardsUnitA generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc.) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Protecting Unit; LandownerData SourcesThis dataset is an aggregation of multiple spatial data sources: • Authoritative land ownership records from BIA, BLM, NPS, USFS, USFWS, and the Alaska Fire Service/State of Alaska• The Protected Areas Database US (PAD-US 4.0)• Census Block-Group Geometry BIA and Tribal Data:BIA and Tribal land management data were aggregated from BIA regional offices. These data date from 2012 and were reviewed/updated in 2024. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The spatial data coverage is a consolidation of the best available records/data received from each of the 12 BIA Regional Offices. The data are no better than the original sources from which they were derived. Care was taken when consolidating these files. However, BWFM cannot accept any responsibility for errors, omissions, or positional accuracy in the original digital data. The information contained in these data is dynamic and is continually changing. Updates to these data will be made whenever such data are received from a Regional Office. The BWFM gives no guarantee, expressed, written, or implied, regarding the accuracy, reliability, or completeness of these data.Alaska:The state of Alaska and Alaska Fire Service (BLM) co-manage a process to aggregate authoritative land ownership, management, and jurisdictional boundary data, based on Master Title Plats. Data ProcessingTo compile this dataset, the authoritative land ownership records and the PAD-US data mentioned above were crosswalked into the Jurisdictional Unit Polygon schema and aggregated through a series of python scripts and FME models. Once aggregated, steps were taken to reduce overlaps within the data. All overlap areas larger than 300 acres were manually examined and removed with the assistance of fire management SMEs. Once overlaps were removed, Census Block Group geometry were crosswalked to the Jurisdictional Unit Polygon schema and appended in areas in which no jurisdictional boundaries were recorded within the authoritative land ownership records and the PAD-US data. Census Block Group geometries represent areas of unknown Landowner Kind/Category and Jurisdictional Kind/Category and were assigned LandownerKind and LandownerCategory values of "Private".Update
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Daily Travel data and number of people staying home and not staying home are estimated for the Bureau of Transportation Statistics by the Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland.
The daily travel estimates are from a mobile device data panel from merged multiple data sources that address the geographic and temporal sample variation issues often observed in a single data source. The merged data panel only includes mobile devices whose anonymized location data meet a set of data quality standards, which further ensures the overall data quality and consistency. The data quality standards consider both temporal frequency and spatial accuracy of anonymized location point observations, temporal coverage and representativeness at the device level, spatial representativeness at the sample and county level, etc. A multi-level weighting method that employs both device and trip-level weights expands the sample to the underlying population at the county and state levels, before travel statistics are computed.
These data are experimental and may not meet all of our quality standards. Experimental data products are created using new data sources or methodologies that benefit data users in the absence of other relevant products. We are seeking feedback from data users and stakeholders on the quality and usefulness of these new products. Experimental data products that meet our quality standards and demonstrate sufficient user demand may enter regular production if resources permit.
Data in the charts and graphs above is updated weekly on Mondays. The data lags one week behind the current date.
Data analysis is conducted at the aggregate national, state, and county levels. To assure confidentiality and support data quality, no data are reported for a county if it has fewer than 50 devices in the sample on any given day.
Trips are defined as movements that include a stay of longer than 10 minutes at an anonymized location away from home. A movement with multiple stays of longer than 10 minutes before returning home is counted as multiple trips.
1.Level : Indicates National, State, or County level metrics.
2.Date : The date when the data was recorded.
3.State FIPS : A two-digit code representing the FIPS state code.
4.State Postal Code : State postal code.
5.County FIPS : Five-digit FIPS county code.
6.County Name : County name.
7.Population Staying at Home : Number of residents staying at home, i.e., persons who make no trips with a trip end more than one mile away from home.
8.Population Not Staying at Home : Number of residents not staying at home.
9.Number of Trips : Number of trips made by residents, i.e., movements that include a stay of longer than 10 minutes at an anonymized location away from home.
10.Number of Trips <1 : Number of trips by residents shorter than one mile.
11.Number of Trips 1-3 : Number of trips by residents greater than one mile and shorter than 3 miles (1 ≤ trip distance < 3 miles).
12.Number of Trips 3-5 : Number of trips by residents greater than 3 miles and shorter than 5 miles (3 ≤ trip distance < 5 miles).
13.Number of Trips 5-10 : Number of trips by residents greater than 5 miles and shorter than 10 miles (5 ≤ trip distance < 10 miles).
14.Number of Trips 10-25 : Number of trips by residents greater than 10 miles and shorter than 25 miles (10 ≤ trip distance < 25 miles).
15.Number of Trips 25-50 : Number of trips by residents greater than 25 miles and shorter than 50 miles (25 ≤ trip distance < 50 miles).
16.Number of Trips 50-100 : Number of trips by residents greater than 50 miles and shorter than 100 miles (50 ≤ trip distance < 100 miles).
17.Number of Trips 100-250 : Number of trips by residents greater than 100 miles and shorter than 250 miles (100 ≤ trip distance < 250 miles).
18.Number of Trips 250-500 : Number of trips by residents greater than 250 miles and shorter than 500 miles (250 ≤ trip distance < 500 miles).
19.Number of Trips >=500 : Number of trips by residents greater than 500 miles (trip distance ≥ 500 miles).
20.Row ID : Unique row identifier.
21.Week : The week number corresponding to the recorded date.
22.Month : The month number corresponding to the recorded date.
If this was helpful, a vote is appreciated 😄!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General
The data source shallowgroundwater is a geospatial dataset of multipolygons that represent the estimated areas, in the Flemish region of Belgium, where the mean lowest groundwater level (MLW; Knotters & Van Walsum, 1997; Van Heesen, 1970) is less than approximately 2 m below soil surface (hence, “shallow” groundwater). We expect groundwater dependent species and communities to be present within these areas. Outside these areas we assume they are groundwater independent. We combined several data sources in order to estimate these areas.
Compilation of the data source
We compiled the dataset through an iterative process of adding specific data sources (referred in the description of attributes below), followed by validation steps based on both the actual presence of groundwater dependent habitat types or regionally important biotopes (Natura 2000 habitat map) and in situ measurements of groundwater levels (Watina+ database). The coverage of these validation data by shallowgroundwater was 96.9% and 98.6% respectively.
Most steps to compile the data source were done manually using QGIS. Final steps were done in R; see R-code in the GitHub repository 'n2khab-preprocessing' at commit 1b004e1.
Detailed properties
The data source is a GeoPackage with a single spatial multipolygon layer shallowgroundwater in the ‘Belge 1972 / Belgian Lambert 72’ coordinate reference system (EPSG-code 31370).
All attributes are boolean (true/false), each indicating if a polygon was selected from the corresponding data source by applying a set of criteria. Multiple attributes can be true for a given polygon. In order to reduce file size, polygons were dissolved by each unique combination of the values of all attributes. Hence the dataset consists of multipolygons (also multipart polygons) rather than single (part) polygons. The different attributes of this dataset reveal for each polygon (1) the data source(s) we relied on and (2) the selection criteria we applied to judge if the mean lowest groundwater level is less than approximately 2 m below soil surface. As far as possible, we reference each used data source in the description of attributes below. If one is interested in the original polygons of each datasource, selections can be made by consulting the referenced data sources and applying the specified criteria.
These are the attributes:
geomorph_wcoast:
source: Cosyns et al. (2019)
description: polygon belonging to geomorphological entities that are expected to harbour groundwater dependent types, and thus to exhibit shallow groundwater levels. It mainly concerns dune slacks, mud flats and salt marshes.
selection: "Code" IN ('ms', 'msl', 'sm', 'ss', 'ys', 'ysl', 'yfs') OR ("Code" = 't' and "Subtype" like '%vlakte%') with ms = medium old dune slack, msl = leveled medium old dune slack, sm = mud flat, ss = salt marsh, ys = young dune slack, ysl = leveled young dune slack, yfs = young frontal dune (intruded), ("Code" = 't' AND "Subtype" like '%vlakte%') = fossil beach
</li>
<li>
<p><code>anthrop_gwdep:</code></p>
<ul>
<li>
<p>source: <code>soilmap_simple</code> (<a href="https://doi.org/10.5281/zenodo.3732903">10.5281/zenodo.3732903</a>) as derived from the <a href="https://www.dov.vlaanderen.be/geonetwork/srv/dut/catalog.search#/metadata/5c129f2d-4498-4bc3-8860-01cb2d513f8f">digital soil map of the Flemish Region</a> (version <code>soilmap_2017-06-20</code>; <a href="https://doi.org/10.5281/zenodo.3387008">10.5281/zenodo.3387008</a>); <code>habitatmap_terr</code> (<a href="https://doi.org/10.5281/zenodo.3468948">10.5281/zenodo.3468948</a>) as derived from the Natura 2000 habitat map of Flanders (<a href="https://doi.org/10.5281/zenodo.3354381">10.5281/zenodo.3354381</a>)</p>
</li>
<li>
<p>description: zones located within a 100 m buffer around (almost) everywhere groundwater dependent habitat types (or regionally important biotopes) ánd situated within zones classified as “anthropogenic” areas within the soil map. Within the zones of the soil map that are designated as “anthropogenic”, we lack information on soil characteristics. However, (almost) everywhere groundwater dependent types are present in these zones according to the Natura 2000 habitat map of Flanders, implying shallow groundwater levels. By including 100 m buffer zones around these types, restricted to the anthropogenic zones of the soil map, we consider the combined areas to have (potentially) shallow groundwater levels. So in practice, we first select the habitatmap polygons with (almost) everywhere groundwater dependent types that intersect the anthropogenic soil polygons, we buffer them, and then clip the result by the anthropogenic soil polygons.</p>
</li>
<li>
<p>selection: see <a href="https://github.com/inbo/n2khab-preprocessing/pull/61">https://github.com/inbo/n2khab-preprocessing/pull/61</a> for the adopted workflow</p>
<ul>
<li>
<p>from <code>soilmap_simple</code>: <code>bsm_mo_soilunitype starts with 'O'</code></p>
</li>
<li>
<p>from <code>habitatmap_terr</code>: (almost) everywhere groundwater dependent types only; list of this category of types is available through <a href="https://inbo.github.io/n2khab">n2khab</a> R-package.</p>
</li>
</ul>
</li>
</ul>
</li>
<li>
<p><code>narrowanthrop_gwdep</code>:</p>
<ul>
<li>
<p>source: <code>soilmap_simple</code> (<a href="https://doi.org/10.5281/zenodo.3732903">10.5281/zenodo.3732903</a>) as derived from the <a href="https://www.dov.vlaanderen.be/geonetwork/srv/dut/catalog.search#/metadata/5c129f2d-4498-4bc3-8860-01cb2d513f8f">digital soil map of the Flemish Region</a> (version <code>soilmap_2017-06-20</code>; <a href="https://doi.org/10.5281/zenodo.3387008">10.5281/zenodo.3387008</a>); <code>habitatmap_terr</code> (<a href="https://doi.org/10.5281/zenodo.3468948">10.5281/zenodo.3468948</a>) as derived from the Natura 2000 habitat map of Flanders (<a href="https://doi.org/10.5281/zenodo.3354381">10.5281/zenodo.3354381</a>)</p>
</li>
<li>
<p>description: narrow zones classified as “anthropogenic” areas within the soil map that include (almost) everywhere groundwater dependent habitat types (or regionally important biotopes). Regarding the anthropogenic soil type polygons, it appears that the narrow ones containing (almost) everywhere groundwater dependent types are interesting to include as a whole as zones with supposed shallow groundwater levels. Hence we select them as a whole instead of selecting buffers around the polygons with (almost) everywhere groundwater dependent types (cfr. <code>anthrop_gwdep</code> and <code>dunes_gwdep</code>). An appropriate algorithm selects meaningful polygons, based on a “thinness” criterion and the fraction of (almost) everywhere groundwater dependent types that are present within the polygons.</p>
</li>
<li>
<p>selection: see https://github.com/inbo/n2khab-preprocessing/pull/61</p>
</li>
</ul>
</li>
<li>
<p><code>drainage</code>:</p>
<ul>
<li>
<p>source: <code>soilmap_simple</code> (<a href="https://doi.org/10.5281/zenodo.3732903">10.5281/zenodo.3732903</a>) as derived from the <a href="https://www.dov.vlaanderen.be/geonetwork/srv/dut/catalog.search#/metadata/5c129f2d-4498-4bc3-8860-01cb2d513f8f">digital soil map of the Flemish Region</a> (version <code>soilmap_2017-06-20</code>; <a href="https://doi.org/10.5281/zenodo.3387008">10.5281/zenodo.3387008</a>)</p>
</li>
<li>
<p>description: Drainage classification is based on a combination of groundwater depth, soil permeability, presence of impermeable layers, soil depth and topography (see Van Ranst & Sys, 2000). </p>
</li>
<li>
<p>selection: <code>bsm_mo_drain in ('c-d', 'd', 'e', 'f', 'g', 'h', 'i', 'e-f', 'e-i', 'h-i')</code>; these are soils that are at least moderately gleyic or wet. </p>
</li>
</ul>
</li>
<li>
<p><code>dunes_gwdep</code>:</p>
<ul>
<li>
<p>source: <code>soilmap_simple</code> (<a href="https://doi.org/10.5281/zenodo.3732903">10.5281/zenodo.3732903</a>) as derived from the <a href="https://www.dov.vlaanderen.be/geonetwork/srv/dut/catalog.search#/metadata/5c129f2d-4498-4bc3-8860-01cb2d513f8f">digital soil map of the Flemish Region</a> (version <code>soilmap_2017-06-20</code>; <a href="https://doi.org/10.5281/zenodo.3387008">10.5281/zenodo.3387008</a>); <code>habitatmap_terr</code> (<a href="https://doi.org/10.5281/zenodo.3468948">10.5281/zenodo.3468948</a>) as derived from the Natura 2000 habitat map of Flanders (<a href="https://doi.org/10.5281/zenodo.3354381">10.5281/zenodo.3354381</a>)</p>
</li>
<li>
<p>description: zones located within a 100 m buffer around (almost) everywhere groundwater dependent habitat types (or regionally important biotopes) ánd situated within zones classified as “dunes” areas within the soil map. Within the zones of the Belgian soil map that are designated as “dunes”, we lack information on soil characteristics. However, (almost) everywhere groundwater dependent types are present in these zones according to the Natura 2000 habitat map of Flanders, implying shallow groundwater levels. By including 100 m buffer zones around these types, restricted to the “dunes” of the soil map, we consider the combined areas to have shallow groundwater levels. So in practice, we first select the habitatmap polygons with (almost) everywhere groundwater dependent types that intersect the “dunes” polygons, we buffer them, and then clip the result
Facebook
TwitterCity of Austin Open Data Terms of Use https://data.austintexas.gov/stories/s/ranj-cccq Possible planting space, abbreviated PPS, or plantable space, estimates land available for tree planting. Data were derived using a U.S. Forest Service method called the Urban Tree Canopy assessment. This assessment determines existing tree canopy cover then finds areas where it is "biophysically feasible" to plant new trees rather than were it may be preferred. For this dataset, a “possible” planting location was determined if the location contained pervious land not covered by existing tree canopy cover, surface water, impervious surfaces, or some pervious surfaces deemed to be not plantable (e.g. baseball diamonds, golf courses, etc.). Data were derived from the following data: NAIP-derived tree canopy cover (2014), the National Hydrography Dataset, and City of Austin impervious cover planimetric data (2015). Data are updated every 4 years and each update is provided as a separate dataset. Download and unzip the folder to get started. Please note, errors may exist in this dataset due to the various source data capture years, differences in spatial resolution, and analysis methods used. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness. Austin Development Services Data Disclaimer: The data provided are for informational use only and may differ from official department data. Austin Development Services’ database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used. Austin Development Services does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.
Facebook
TwitterBy Coronavirus (COVID-19) Data Hub [source]
The COVID-19 Global Time Series Case and Death Data is a comprehensive collection of global COVID-19 case and death information recorded over time. This dataset includes data from various sources such as JHU CSSE COVID-19 Data and The New York Times.
The dataset consists of several columns providing detailed information on different aspects of the COVID-19 situation. The COUNTRY_SHORT_NAME column represents the short name of the country where the data is recorded, while the Data_Source column indicates the source from which the data was obtained.
Other important columns include Cases, which denotes the number of COVID-19 cases reported, and Difference, which indicates the difference in case numbers compared to the previous day. Additionally, there are columns such as CONTINENT_NAME, DATA_SOURCE_NAME, COUNTRY_ALPHA_3_CODE, COUNTRY_ALPHA_2_CODE that provide additional details about countries and continents.
Furthermore, this dataset also includes information on deaths related to COVID-19. The column PEOPLE_DEATH_NEW_COUNT shows the number of new deaths reported on a specific date.
To provide more context to the data, certain columns offer demographic details about locations. For instance, Population_Count provides population counts for different areas. Moreover,**FIPS** code is available for provincial/state regions for identification purposes.
It is important to note that this dataset covers both confirmed cases (Case_Type: confirmed) as well as probable cases (Case_Type: probable). These classifications help differentiate between various types of COVID-19 infections.
Overall, this dataset offers a comprehensive picture of global COVID-19 situations by providing accurate and up-to-date information on cases, deaths, demographic details like population count or FIPS code), source references (such as JHU CSSE or NY Times), geographical information (country names coded with ALPHA codes) , etcetera making it useful for researchers studying patterns and trends associated with this pandemic
Understanding the Dataset Structure:
- The dataset is available in two files: COVID-19 Activity.csv and COVID-19 Cases.csv.
- Both files contain different columns that provide information about the COVID-19 cases and deaths.
- Some important columns to look out for are: a. PEOPLE_POSITIVE_CASES_COUNT: The total number of confirmed positive COVID-19 cases. b. COUNTY_NAME: The name of the county where the data is recorded. c. PROVINCE_STATE_NAME: The name of the province or state where the data is recorded. d. REPORT_DATE: The date when the data was reported. e. CONTINENT_NAME: The name of the continent where the data is recorded. f. DATA_SOURCE_NAME: The name of the data source. g. PEOPLE_DEATH_NEW_COUNT: The number of new deaths reported on a specific date. h.COUNTRY_ALPHA_3_CODE :The three-letter alpha code represents country f.Lat,Long :latitude and longitude coordinates represent location i.Country_Region or COUNTRY_SHORT_NAME:The country or region where cases were reported.
Choosing Relevant Columns: It's important to determine which columns are relevant to your analysis or research question before proceeding with further analysis.
Exploring Data Patterns: Use various statistical techniques like summarizing statistics, creating visualizations (e.g., bar charts, line graphs), etc., to explore patterns in different variables over time or across regions/countries.
Filtering Data: You can filter your dataset based on specific criteria using column(s) such as COUNTRY_SHORT_NAME, CONTINENT_NAME, or PROVINCE_STATE_NAME to focus on specific countries, continents, or regions of interest.
Combining Data: You can combine data from different sources (e.g., COVID-19 cases and deaths) to perform advanced analysis or create insightful visualizations.
Analyzing Trends: Use the dataset to analyze and identify trends in COVID-19 cases and deaths over time. You can examine factors such as population count, testing count, hospitalization count, etc., to gain deeper insights into the impact of the virus.
Comparing Countries/Regions: Compare COVID-19
- Trend Analysis: This dataset can be used to analyze and track the trends of COVID-19 cases and deaths over time. It provides comprehensive global data, allowing researchers and po...
Facebook
TwitterThis repository contains spatiotemporal data from many official sources for 2019-Novel Coronavirus beginning 2019 in Hubei, China ("nCoV_2019")
You may not use this data for commercial purposes. If there is a need for commercial use of the data, please contact Metabiota at info@metabiota.com to obtain a commercial use license.
The incidence data are in a CSV file format. One row in an incidence file contains a piece of epidemiological data extracted from the specified source.
The file contains data from multiple sources at multiple spatial resolutions in cumulative and non-cumulative formats by confirmation status. To select a single time series of case or death data, filter the incidence dataset by source, spatial resolution, location, confirmation status, and cumulative flag.
Data are collected, structured, and validated by Metabiota’s digital surveillance experts. The data structuring process is designed to produce the most reliable estimates of reported cases and deaths over space and time. The data are cleaned and provided in a uniform format such that information can be compared across multiple sources. Data are collected at the time of publication in the highest geographic and temporal resolutions available in the original report.
This repository is intended to provide a single access point for data from a wide range of data sources. Data will be updated periodically with the latest epidemiological data. Metabiota maintains a database of epidemiological information for over two thousand high-priority infectious disease events. Please contact us (info@metabiota.com) if you are interested in licensing the complete dataset.
Reporting sources provide either cumulative incidence, non-cumulative incidence, or both. If the source only provides a non-cumulative incidence value, the cumulative values are inferred using prior reports from the same source. Use the CUMULATIVE FLAG variable to subset the data to cumulative (TRUE) or non-cumulative (FALSE) values.
The incidence datasets include the confirmation status of cases and deaths when this information is provided by the reporting source. Subset the data by the CONFIRMATION_STATUS variable to either TOTAL, CONFIRMED, SUSPECTED, or PROBABLE to obtain the data of your choice.
Total incidence values include confirmed, suspected, and probable incidence values. If a source only provides suspected, probable, or confirmed incidence, the total incidence is inferred to be the sum of the provided values. If the report does not specify confirmation status, the value is included in the "total" confirmation status value.
The data provided under the "Metabiota Composite Source" often does not include suspected incidence due to inconsistencies in reporting cases and deaths with this confirmation status.
The incidence datasets include cases and deaths. Subset the data to either CASE or DEATH using the OUTCOME variable. It should be noted that deaths are included in case counts.
Data are provided at multiple spatial resolutions. Data should be subset to a single spatial resolution of interest using the SPATIAL_RESOLUTION variable.
Information is included at the finest spatial resolution provided to the original epidemic report. We also aggregate incidence to coarser geographic resolutions. For example, if a source only provides data at the province-level, then province-level data are included in the dataset as well as country-level totals. Users should avoid summing all cases or deaths in a given country for a given date without specifying the SPATIAL_RESOLUTION value. For example, subset the data to SPATIAL_RESOLUTION equal to “AL0” in order to view only the aggregated country level data.
There are differences in administrative division naming practices by country. Administrative levels in this dataset are defined using the Google Geolocation API (https://developers.google.com/maps/documentation/geolocation/). For example, the data for the 2019-nCoV from one source provides information for the city of Beijing, which Google Geolocations indicates is a “locality.” Beijing is also the name of the municipality where the city Beijing is located. Thus, the 2019-nCoV dataset includes rows of data for both the city Beijing, as well as the municipality of the same name. If additional cities in the Beijing municipality reported data, those data would be aggregated with the city Beijing data to form the municipality Beijing data.
Data sources in this repository were selected to provide comprehensive spatiotemporal data for each outbreak. Data from a specific source can be selected using the SOURCE variable.
In addition to the original reporting sources, Metabiota compiles multiple sources to generate the most comprehensive view of an outbreak. This compilation is stored in the database under the source name “Metabiota Composite Source.” The purpose of generating this new view of the outbreak is to provide the most accurate and precise spatiotemporal data for the outbreak. At this time, Metabiota does not incorporate unofficial - including media - sources into the “Metabiota Composite Source” dataset.
Data are collected by a team of digital surveillance experts and undergo many quality assurance tests. After data are collected, they are independently verified by at least one additional analyst. The data also pass an automated validation program to ensure data consistency and integrity.
Creative Commons License Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
This is a human-readable summary of the Legal Code.
You are free:
to Share — to copy, distribute and transmit the work to Remix — to adapt the work
Under the following conditions:
Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Noncommercial — You may not use this work for commercial purposes.
Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
With the understanding that:
Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.
Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
Other Rights — In no way are any of the following rights affected by the license: Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; The author's moral rights; Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.
For details and the full license text, see http://creativecommons.org/licenses/by-nc-sa/3.0/
Metabiota shall in no event be liable for any decision taken by the user based on the data made available. Under no circumstances, shall Metabiota be liable for any damages (whatsoever) arising out of the use or inability to use the database. The entire risk arising out of the use of the database remains with the user.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each file in the dataset contains machine-learning-ready data for one unique tropical cyclone (TC) from the real-time testing dataset. "Machine-learning-ready" means that all data-processing methods described in the journal paper have already been applied. This includes cropping satellite images to make them TC-centered; rotating satellite images to align them with TC motion (TC motion is always towards the +x-direction, or in the direction of increasing column number); flipping satellite images in the southern hemisphere upside-down; and normalizing data via the two-step procedure.
The file name gives you the unique identifier of the TC -- e.g., "learning_examples_2010AL01.nc.gz" contains data for storm 2010AL01, or the first North Atlantic storm of the 2010 season. Each file can be read with the method `example_io.read_file` in the ml4tc Python library (https://zenodo.org/doi/10.5281/zenodo.10268620). However, since `example_io.read_file` is a lightweight wrapper for `xarray.open_dataset`, you can equivalently just use `xarray.open_dataset`. Variables in the table are listed below (the same printout produced by `print(xarray_table)`):
Dimensions: (
satellite_valid_time_unix_sec: 289,
satellite_grid_row: 380,
satellite_grid_column: 540,
satellite_predictor_name_gridded: 1,
satellite_predictor_name_ungridded: 16,
ships_valid_time_unix_sec: 19,
ships_storm_object_index: 19,
ships_forecast_hour: 23,
ships_intensity_threshold_m_s01: 21,
ships_lag_time_hours: 5,
ships_predictor_name_lagged: 17,
ships_predictor_name_forecast: 129)
Coordinates:
* satellite_grid_row (satellite_grid_row) int32 2kB ...
* satellite_grid_column (satellite_grid_column) int32 2kB ...
* satellite_valid_time_unix_sec (satellite_valid_time_unix_sec) int32 1kB ...
* ships_lag_time_hours (ships_lag_time_hours) float64 40B ...
* ships_intensity_threshold_m_s01 (ships_intensity_threshold_m_s01) float64 168B ...
* ships_forecast_hour (ships_forecast_hour) int32 92B ...
* satellite_predictor_name_gridded (satellite_predictor_name_gridded) object 8B ...
* satellite_predictor_name_ungridded (satellite_predictor_name_ungridded) object 128B ...
* ships_valid_time_unix_sec (ships_valid_time_unix_sec) int32 76B ...
* ships_predictor_name_lagged (ships_predictor_name_lagged) object 136B ...
* ships_predictor_name_forecast (ships_predictor_name_forecast) object 1kB ...
Dimensions without coordinates: ships_storm_object_index
Data variables:
satellite_number (satellite_valid_time_unix_sec) int32 1kB ...
satellite_band_number (satellite_valid_time_unix_sec) int32 1kB ...
satellite_band_wavelength_micrometres (satellite_valid_time_unix_sec) float64 2kB ...
satellite_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
satellite_cyclone_id_string (satellite_valid_time_unix_sec) |S8 2kB ...
satellite_storm_type_string (satellite_valid_time_unix_sec) |S2 578B ...
satellite_storm_name (satellite_valid_time_unix_sec) |S10 3kB ...
satellite_storm_latitude_deg_n (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_intensity_number (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_u_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
satellite_storm_v_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
satellite_predictors_gridded (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column, satellite_predictor_name_gridded) float64 474MB ...
satellite_grid_latitude_deg_n (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
satellite_grid_longitude_deg_e (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
satellite_predictors_ungridded (satellite_valid_time_unix_sec, satellite_predictor_name_ungridded) float64 37kB ...
ships_storm_intensity_m_s01 (ships_valid_time_unix_sec) float64 152B ...
ships_storm_type_enum (ships_storm_object_index, ships_forecast_hour) int32 2kB ...
ships_forecast_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_forecast_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_v_wind_200mb_0to500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_vorticity_850mb_0to1000km_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_vortex_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_vortex_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_850mb_0to600km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_max_tangential_wind_850mb_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_1000mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_850mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_500mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_mean_tangential_wind_300mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_srh_1000to700mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_srh_1000to500mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
ships_threshold_exceedance_num_6hour_periods (ships_storm_object_index, ships_intensity_threshold_m_s01) int32 2kB ...
ships_v_motion_observed_m_s01 (ships_storm_object_index) float64 152B ...
ships_v_motion_1000to100mb_flow_m_s01 (ships_storm_object_index) float64 152B ...
ships_v_motion_optimal_flow_m_s01 (ships_storm_object_index) float64 152B ...
ships_cyclone_id_string (ships_storm_object_index) object 152B ...
ships_storm_latitude_deg_n (ships_storm_object_index) float64 152B ...
ships_storm_longitude_deg_e (ships_storm_object_index) float64 152B ...
ships_predictors_lagged (ships_valid_time_unix_sec, ships_lag_time_hours, ships_predictor_name_lagged) float64 13kB ...
ships_predictors_forecast (ships_valid_time_unix_sec, ships_forecast_hour, ships_predictor_name_forecast) float64 451kB ...
Variable names are meant to be as self-explanatory as possible. Potentially confusing ones are listed below.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Global Biotic Interactions: Interpreted Data Products
Global Biotic Interactions (GloBI, https://globalbioticinteractions.org, [1]) aims to facilitate access to existing species interaction records (e.g., predator-prey, plant-pollinator, virus-host). This data publication provides interpreted species interaction data products. These products are the result of a process in which versioned, existing species interaction datasets ([2]) are linked to the so-called GloBI Taxon Graph ([3]) and transformed into various aggregate formats (e.g., tsv, csv, neo4j, rdf/nquad, darwin core-ish archives). In addition, the applied name maps are included to make the applied taxonomic linking explicit.
Citation
--------
GloBI is made possible by researchers, collections, projects and institutions openly sharing their datasets. When using this data, please make sure to attribute these *original data contributors*, including citing the specific datasets in derivative work. Each species interaction record indexed by GloBI contains a reference and dataset citation. Also, a full lists of all references can be found in citations.csv/citations.tsv files in this publication. If you have ideas on how to make it easier to cite original datasets, please open/join a discussion via https://globalbioticinteractions.org or related projects.
To credit GloBI for more easily finding interaction data, please use the following citation to reference GloBI:
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.
Bias and Errors
--------
As with any analysis and processing workflow, care should be taken to understand the bias and error propagation of data sources and related data transformation processes. The datasets indexed by GloBI are biased geospatially, temporally and taxonomically ([5], [6]). Also, mapping of verbatim names from datasets to known name concept may contains errors due to synonym mismatches, outdated names lists, typos or conflicting name authorities. Finally, bugs may introduce bias and errors in the resulting integrated data product.
To help better understand where bias and errors are introduced, only versioned data and code are used as an input: the datasets ([2]), name maps ([3]) and integration software ([6]) are versioned so that the integration processes can be reproduced if needed. This way, steps take to compile an integrated data record can be traced and the sources of bias and errors can be more easily found.
Contents
--------
README:
this file
citations.csv.gz:
contains data citations in a in a gzipped comma-separated values format.
citations.tsv.gz:
contains data citations in a gzipped tab-separated values format.
datasets.csv.gz:
contains list of indexed datasets in a gzipped comma-separated values format.
datasets.tsv.gz:
contains list of indexed datasets in a gzipped tab-separated values format.
verbatim-interactions.csv.gz
contains species interactions tabulated as pair-wise interaction in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.
verbatim-interactions.tsv.gz
contains species interactions tabulated as pair-wise interaction in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.
interactions.csv.gz:
contains species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
interactions.tsv.gz:
contains species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
refuted-interactions.csv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
refuted-interactions.tsv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
refuted-verbatim-interactions.csv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.
refuted-verbatim-interactions.tsv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.
interactions.nq.gz:
contains species interactions expressed in the resource description framework in a gzipped rdf/quads format.
dwca-by-study.zip:
contains species interactions data as a Darwin Core Archive aggregated by study using a custom, occurrence level, association extension.
dwca.zip:
contains species interactions data as a Darwin Core Archive using a custom, occurrence level, association extension.
neo4j-graphdb.zip:
contains a neo4j v3.5.x graph database snapshot containing a graph representation of the species interaction data.
taxonCache.tsv.gz:
contains hierarchies and identifiers associated with names from naming schemes in a gzipped tab-separated values format.
taxonMap.tsv.gz:
describes how names in existing datasets were mapped into existing naming schemes in a gzipped tab-separated values format.
References
-----
[1] Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. doi: 10.1016/j.ecoinf.2014.08.005.
[2] Poelen, J. H. (2020) Global Biotic Interactions: Elton Dataset Cache. Zenodo. doi: 10.5281/ZENODO.3950557.
[3] Poelen, J. H. (2021). Global Biotic Interactions: Taxon Graph (Version 0.3.28) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4451472
[4] Hortal, J. et al. (2015) Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), pp.523–549. doi: 10.1146/annurev-ecolsys-112414-054400.
[5] Cains, M. et al. (2017) Ivmooc 2017 - Gap Analysis Of Globi: Identifying Research And Data Sharing Opportunities For Species Interactions. Zenodo. Zenodo. doi: 10.5281/ZENODO.814978.
[6] Poelen, J. et al. (2022) globalbioticinteractions/globalbioticinteractions v0.24.6. Zenodo. doi: 10.5281/ZENODO.7327955.
Content References
-----
hash://sha256/2ed02ef8ab52cb51aef6fb42badeb495ba6a87dd6cf11be5f480c7bc1c902054 citations.csv.gz
hash://sha256/00195434368cec79f051ccb69238d2646b53530e4fd42936748428f055fdb0cc citations.tsv.gz
hash://sha256/b8898e7aea05121e7d15948dcc76d4dde6ed330db98f76ebcc4c03ba52622dcc datasets.csv.gz
hash://sha256/b8898e7aea05121e7d15948dcc76d4dde6ed330db98f76ebcc4c03ba52622dcc datasets.tsv.gz
hash://sha256/aa13e6fb98fd3aa4aaeaa89d6dccfd983e542fe010f0ffbb31fa17243f5735e3 dwca-by-study.zip
hash://sha256/4ae323bfc1255f3c6dd60b13a1be237cbfbd1c87aad595f7570e83fc9e84db08 dwca.zip
hash://sha256/0f1328b00c1b44aa19cf677790a0e649ddceb2a4e0babbe251a4af9e032f3dde interactions.csv.gz
hash://sha256/ad0297993328deee5178db4e5fe20135a21dde529f68adab63f8de9a02512514 interactions.nq.gz
hash://sha256/1c8de35d42fb298f1a27f4eb286309e39e6ab768d24d3c3bec1490f23d3594b6 interactions.tsv.gz
hash://sha256/f35ce82bf5c00882e4258edc883b41123f002c1fb9d64485abc101b00cb28e79 neo4j-graphdb.zip
hash://sha256/b002bcb378482a33847725fc52c8e26a42af5c5da9755449d8f0d10c9aa9f7f0 refuted-interactions.csv.gz
hash://sha256/7beb77546aad6e9de756d6161e35f55cfa725072ca77ba5c0b72a00e53146127 refuted-interactions.tsv.gz
hash://sha256/89fa5fc3bdc76451dd5d2a79c1473b437615e5c7e551ec5e57ff8b71e9a280ea refuted-verbatim-interactions.csv.gz
hash://sha256/ea83faba0aa0792cebe055832553197701025fdfe2f07ec34599075819916707 refuted-verbatim-interactions.tsv.gz
hash://sha256/4cf48959ea839e371a0344aab4b31f36242c84ac24e44f4db948524523b3563f taxonCache.tsv.gz
hash://sha256/bf38fe30df535f9e0b6b22fa726c10f35d391d616e6d107cc7582505141fd13d taxonMap.tsv.gz
hash://sha256/ce0d4f35b0970df3fe4e1623e473a5390b39297efae7f9e1474bfe2e8bc15d48 verbatim-interactions.csv.gz
hash://sha256/965718c7a9ec4ec1adc98413b52e31c090ad1ba5a04be088d579c5c9d59ffef0 verbatim-interactions.tsv.gz
hash://md5/ad99f71b8d3e0b67b7d4578a0a123c40 citations.csv.gz
hash://md5/2a27a963e745a12042c6c9886f87f842 citations.tsv.gz
hash://md5/580a4e1cfed5a6235f6c35277d0c7b10 datasets.csv.gz
hash://md5/580a4e1cfed5a6235f6c35277d0c7b10 datasets.tsv.gz
hash://md5/6c7294aa2b507143e10c390ae6008ed1 dwca-by-study.zip
hash://md5/a9694ecc6de81d9893998be05a8ef2de dwca.zip
hash://md5/0415cd469b8892fb3f5435048b6e85bf interactions.csv.gz
hash://md5/1b48bf7a344bdd3c706a94666607cd71 interactions.nq.gz
hash://md5/445b2c97e2e44d2dbc4aa93084ecacfc interactions.tsv.gz
hash://md5/ca3e4780032c8c58e90242bcdf1328d5 neo4j-graphdb.zip
hash://md5/03600b16405fc2a4ea60925d69b6e16f
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Global Biotic Interactions: Interpreted Data Products
Global Biotic Interactions (GloBI, https://globalbioticinteractions.org, [1]) aims to facilitate access to existing species interaction records (e.g., predator-prey, plant-pollinator, virus-host). This data publication provides interpreted species interaction data products. These products are the result of a process in which versioned, existing species interaction datasets ([2]) are linked to the so-called GloBI Taxon Graph ([3]) and transformed into various aggregate formats (e.g., tsv, csv, neo4j, rdf/nquad, darwin core-ish archives). In addition, the applied name maps are included to make the applied taxonomic linking explicit.
Citation--------
GloBI is made possible by researchers, collections, projects and institutions openly sharing their datasets. When using this data, please make sure to attribute these original data contributors, including citing the specific datasets in derivative work. Each species interaction record indexed by GloBI contains a reference and dataset citation. Also, a full lists of all references can be found in citations.csv/citations.tsv files in this publication. If you have ideas on how to make it easier to cite original datasets, please open/join a discussion via https://globalbioticinteractions.org or related projects.
To credit GloBI for more easily finding interaction data, please use the following citation to reference GloBI:
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.
Bias and Errors--------
As with any analysis and processing workflow, care should be taken to understand the bias and error propagation of data sources and related data transformation processes. The datasets indexed by GloBI are biased geospatially, temporally and taxonomically ([5], [6]). Also, mapping of verbatim names from datasets to known name concept may contains errors due to synonym mismatches, outdated names lists, typos or conflicting name authorities. Finally, bugs may introduce bias and errors in the resulting integrated data product.
To help better understand where bias and errors are introduced, only versioned data and code are used as an input: the datasets ([2]), name maps ([3]) and integration software ([6]) are versioned so that the integration processes can be reproduced if needed. This way, steps take to compile an integrated data record can be traced and the sources of bias and errors can be more easily found.
This version was preceded by [7].
Contents--------
README:this file
citations.csv.gz:contains data citations in a in a gzipped comma-separated values format.
citations.tsv.gz:contains data citations in a gzipped tab-separated values format.
datasets.csv.gz:contains list of indexed datasets in a gzipped comma-separated values format.
datasets.tsv.gz:contains list of indexed datasets in a gzipped tab-separated values format.
verbatim-interactions.csv.gzcontains species interactions tabulated as pair-wise interaction in a gzipped comma-separated values format. Included taxonomic name are not interpreted, but included as documented in their sources.
verbatim-interactions.tsv.gzcontains species interactions tabulated as pair-wise interaction in a gzipped tab-separated values format. Included taxonomic name are not interpreted, but included as documented in their sources.
interactions.csv.gz:contains species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
interactions.tsv.gz:contains species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
refuted-interactions.csv.gz:contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
refuted-interactions.tsv.gz:contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.
refuted-verbatim-interactions.csv.gz:contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic name are not interpreted, but included as documented in their sources.
refuted-verbatim-interactions.tsv.gz:contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic name are not interpreted, but included as documented in their sources.
interactions.nq.gz:contains species interactions expressed in the resource description framework in a gzipped rdf/quads format.
dwca-by-study.zip:contains species interactions data as a Darwin Core Archive aggregated by study using a custom, occurrence level, association extension.
dwca.zip:contains species interactions data as a Darwin Core Archive using a custom, occurrence level, association extension.
neo4j-graphdb.zip:contains a neo4j v3.5.32 graph database snapshot containing a graph representation of the species interaction data.
taxonCache.tsv.gz:contains hierarchies and identifiers associated with names from naming schemes in a gzipped tab-separated values format.
taxonMap.tsv.gz:describes how names in existing datasets were mapped into existing naming schemes in a gzipped tab-separated values format.
References-----
[1] Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. doi: 10.1016/j.ecoinf.2014.08.005.
[2] Poelen, J. H. (2020) Global Biotic Interactions: Elton Dataset Cache. Zenodo. doi: 10.5281/ZENODO.3950557.
[3] Poelen, J. H. (2021). Global Biotic Interactions: Taxon Graph (Version 0.3.28) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4451472
[4] Hortal, J. et al. (2015) Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), pp.523–549. doi: 10.1146/annurev-ecolsys-112414-054400.
[5] Cains, M. et al. (2017) Ivmooc 2017 - Gap Analysis Of Globi: Identifying Research And Data Sharing Opportunities For Species Interactions. Zenodo. Zenodo. doi: 10.5281/ZENODO.814978.
[6] Poelen, J. et al. (2022) globalbioticinteractions/globalbioticinteractions v0.24.6. Zenodo. doi: 10.5281/ZENODO.7327955.
[7] GloBI Community. (2024). Global Biotic Interactions: Interpreted Data Products hash://md5/946f7666667d60657dc89d9af8ffb909 hash://sha256/4e83d2daee05a4fa91819d58259ee58ffc5a29ec37aa7e84fd5ffbb2f92aa5b8 (0.7) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11552565
Content References-----
hash://sha256/5f4906439eba61f936b3dd7455a62c51656a74206f82d3f654e330fda6fbbe45 citations.csv.gzhash://sha256/c8100368dae39363b241472695c1ae197aaddc6e3d6c0a14f3f5ee704b37f3f6 citations.tsv.gzhash://sha256/e6f4aa897c5b325e444315e021b246ffed07fef764b0de6c0f1b2688bbdf9d0f datasets.csv.gzhash://sha256/e6f4aa897c5b325e444315e021b246ffed07fef764b0de6c0f1b2688bbdf9d0f datasets.tsv.gzhash://sha256/f11dc825609cdb1d4a3e9ba8caca9bf93c90dd6f660c7f6a0c8aa01c035a5e1f dwca-by-study.ziphash://sha256/7f16aacacae74e8b0cdef04c612ba776f508ff7ffe385abc57583e37aec8fe53 dwca.ziphash://sha256/b65e4c9a3615f1386bb97e45fb907d053df55476149aa6d71e6f398351218d0d interactions.csv.gzhash://sha256/0c28032392f82d753690be126805e6334ca46bdc4b5e2102a79b15ce0cc0ba90 interactions.nq.gzhash://sha256/8a7031250c288ba0da3d5cdbedc19d54c2f16ba3aa70d49826a7369b6edeca04 interactions.tsv.gzhash://sha256/d0c0fbf536cc63c004d057efc14600ba8cc5874f401b08f51837273b7854f1bb neo4j-graphdb.ziphash://sha256/50e77636f8b58c040e38b6a70ba7cc8288b190ef252dc0d4eb2f12f4c541e82f READMEhash://sha256/a74e2a39cfe133ae9de1eeea94f5dda8cbd58cfe61a8ccf91b7c540757719c74 refuted-interactions.csv.gzhash://sha256/37b06e274e41ca749399763989816854101238ade9863365f384a2764c639e9d refuted-interactions.tsv.gzhash://sha256/23315b6cd3fdc91f9c1d5d5bc39fa52cf1cef7a4e97d9d023d452751df13f30e refuted-verbatim-interactions.csv.gzhash://sha256/ff82e40cee4f8a8852d0c241f5027f66157a2b8a9090ffa3a0a329a206828d96 refuted-verbatim-interactions.tsv.gzhash://sha256/f072fbc7affb6e29978c7540af6cdccd3a219a23b0a4765b5bae56bd20df0d88 taxonCache.tsv.gzhash://sha256/cd28c81bb2432646a81ad216bc11818f7568ce81826e0074d9a33579da2c1426 taxonMap.tsv.gzhash://sha256/a1d14aa47806c624cf7e3a8c8236643dcf19ed1835c79c65958f7317ebfb9566 verbatim-interactions.csv.gzhash://sha256/2284434219d5fdab1e2152955f04363852c132b76709c330d33e31517817a82e verbatim-interactions.tsv.gz
hash://md5/d6ebf42729d988e15cb30adfa6112234 citations.csv.gzhash://md5/42877ae68e51871b8eb7116e62f6b268 citations.tsv.gzhash://md5/3e437580296fdeff3b6f35d1331db9d1 datasets.csv.gzhash://md5/3e437580296fdeff3b6f35d1331db9d1 datasets.tsv.gzhash://md5/fe88720fd992771bd64bfa220ad6a7d3 dwca-by-study.ziphash://md5/cbe132a9288feaef2f3e0c0409b8dc2f dwca.ziphash://md5/051f6db667c4b84616223c2776464dbf interactions.csv.gzhash://md5/b66857f8750e56ba9abe484b1f72eac4 interactions.nq.gzhash://md5/300839c346184b2fedc4e1fb31bcc29c interactions.tsv.gzhash://md5/e79cf5ffee919672f99ea338f3661566 neo4j-graphdb.ziphash://md5/898678f47561d7ef53722bc32957dcd9 READMEhash://md5/65a185f19df304e53f92a7275f2de291 refuted-interactions.csv.gzhash://md5/bc37a4354f8a2402e9335ae44f28cbd7
Facebook
TwitterAbstract Biodiversity research has advanced by testing expectations of ecological and evolutionary hypotheses through the linking of large-scale genetic, distributional, and trait datasets. The rise of molecular systematics over the past 30 years has resulted in a wealth of DNA sequences from around the globe. Yet, advances in molecular systematics also have created taxonomic instability, as new estimates of evolutionary relationships and interpretations of species limits have required widespread scientific name changes. Taxonomic instability, colloquially “splits, lumps, and shuffles,†presents logistical challenges to large-scale biodiversity research because (1) the same species or sets of populations may be listed under different names in different data sources, or (2) the same name may apply to different sets of populations representing different taxonomic concepts. Consequently, distributional and trait data are often difficult to link directly to primary DNA sequen..., Taxonomic reconciliationWe downloaded all names from the NCBI Taxonomy database (Schoch et al., 2020) that descended from “Aves†(TaxID: 8782) on 3 May 2020 (Data Repository D2). From this list, we extracted all species and subspecies names as well as their NCBI Taxonomy ID (TaxID) numbers. We then ran a custom Perl script (Data Repository D3) to exactly match binomial (genus, species) and trinomial (genus, species, subspecies) names from NCBI Taxonomy to the names recognized by eBird/Clements v2019 Integrated Checklist (August 2019; Data Repository D4). For each mismatch with the NCBI Taxonomy name, we then identified the corresponding equivalent eBird/Clements species or subspecies. We first searched for names in Avibase (Lepage et al., 2014). However, Avibase’s search function currently facilitates only exact matches to taxonomies it implements. For names that were not an exact match to an Avibase taxonomic concept, we implemented web searches (Google) which often identified minor sp..., D1:"PetersVsClements2Final.txt" - This file tells which species from the Peters taxonomy match the 2019 Clements/ebird taxonomy. The first column has a species name from the Peters taxonomy. In the second column, "Clements" indicates that the species name matches the Clements/ebird taxonomy, "No" means it does match, and "Close" means that the names match when you disregard the last two letters."SibleyMonroeVsClements_Final.txt" - This file tells which species from the Sibley Monroe taxonomy match the 2019 Clements/ebird taxonomy. The first column has a species ID number from the Sibley Monroe taxonomy. The second column has the species scientific name from the Sibley Monroe taxonomy. The third column has the common name from the Sibley Monroe taxonomy. In the fourth column, "Clements" indicates that the species name matches the Clements/ebird taxonomy, "No" means it does match, and "Close" means that the names match when you disregard the last two letters.D2:"taxonomy_result.unix.xml" ...
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The NRCS National Water and Climate Center Report Generator web-based application uses long-term snowpack, precipitation, reservoir, streamflow, and soils data from a variety of quality-controlled sources to create reports. Users can choose from predefined templates or build custom reports. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats such as JPG and PNG. The Report Generator network incorporates data from many agency databases. The NRCS snow survey flagship database, the Water and Climate Information System (WCIS), provides a wealth of data, including manually-collected snow course data and information from automated Snow Telemetry (SNOTEL) and Soil Climate Analysis Network (SCAN) stations across the United States. Report Generator also uses precipitation, streamflow, and reservoir data from the U.S. Army Corps of Engineers (USACE), the U.S. Bureau of Reclamation (BOR), the Applied Climate Information System (ACIS), the U.S. Geological Survey (USGS), various water districts and other entities. In addition to creating reports, Report Generator lets you view information on sites, including metadata, such as elevation, latitude/longitude and hydrologic unit code (HUC). You can also view photos of the site, including a site map (in Google maps when available). Report Generator creates reports in both tabular and chart format. Single-station and multiple-station charting is also supported. Data may be displayed in either English or Metric units. Farmers, municipalities, water and hydroelectric utilities, environmental organizations, fish and wildlife managers, tribal nations, reservoir managers, recreationists, wetlands managers, urban developers, transportation departments, and research organizations regularly use these data and products. This release has several new features which focus on improving the way reports are specified and how they are displayed. Multi-station charting is also supported in this release. Resources in this dataset:Resource Title: Report Generator 2.0. File Name: Web Page, url: https://wcc.sc.egov.usda.gov/reportGenerator/ Create custom reports and charts from multiple data sources. Data from tabular reports may be exported to different formats, including comma-separated value (CSV) files. Charts can be saved to graphics formats, such as JPG and PNG.
Facebook
TwitterClassification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.