Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
There has been persistent interest in the mediating role of the micronutrient selenium (Se) in mercury (Hg) toxicity since the 1960s. Despite many unresolved questions regarding Se–Hg interactions, considerable research has been performed to document Se:Hg molar ratios in aquatic animals as a basis for inferring the health risks associated with their consumption. We compiled the co-reported Se and Hg data for 386 shellfish, finfish, and aquatic mammals from 89 papers to assess differences in consumption safety categorizations according to health risk metrics that (a) consider Hg concentrations alone or (b) co-consider Se and Hg concentrations. Species-specific mean Hg concentrations for 23% of all data points in our database exceeded the FAO/WHO maximum level, and 68% and 83% had estimated daily intake per meal values for adults and children above the Joint FAO/WHO Expert Committee on Food Additives Provisional Tolerable Daily Index. In contrast, only 12% of data points would be categorized as unsafe for consumption on the basis of their 1:1 Se:Hg molar ratios. Se-inclusive risk metrics reversed safety categorizations for a majority of high-trophic level species that are most likely to exceed Hg-based risk thresholds. Adopting Se-inclusive risk metrics has potentially significant implications for seafood consumer health and is likely premature.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Source:
This dataset supports the publication: Martin, H.K., Edmonds, D.A., Yanites, B.J. & Niemi, N.A. (2024) Quantifying landscape change following catastrophic dam failures in Edenville and Sanford, Michigan, USA. Earth Surf. Process. Landforms, Available from: https://doi.org/10.1002/esp.5855. All of the details about how these data were collected and processed are provided in that paper, particularly in the Supporting Information.
Brief Summary:
On May 19, 2020, the Edenville and Sanford dams near Midland, Michigan, USA failed as the result of significant rainfall over the preceding two days. We analyzed the geomorphic impacts of these failures using a pre-failure airborne lidar dataset and three UAV-based lidar surveys collected two weeks ("2020-06"), three months ("2020-08)", and eleven months ("2021-04") post-failure. The pre-failure airborne lidar dataset was merged from two datasets hosted on OpenTopography as part of the USGS 3DEP project. These are linked to in the Supporting Information document for the above manuscript. This upload contains data from the three UAV-based surveys we collected.
Structure/Contents:
The .zip file is structured first by dates, with each folder corresponding to one of our three field campaigns.
Within each date folder, there are two folders: one for data collected at Edenville and another for data collected at Sanford.
Within each location folder, there are two items.
1. One is a .las file that contains the ground-classified point cloud data. For Edenville, these point clouds are merged from data collected from multiple launch locations. The point cloud file has had corrections applied to it. It corresponds to the output that results from the Supporting Information through the end of section 2.3: Lidar processing. These files can be opened and worked with using most-any software that handles point clouds; users preferring free and open source software could consider using CloudCompare [https://cloudcompare.org/].
2. The second is a folder containing DEMs, which are raster images derived from the point cloud data in the corresponding location folder. Each pixel contains the elevation of that area in meters above sea level. Two copies are provided: one with the pixel size set to 50 cm by 50 cm, and the other set to 1 m by 1 m. These were created by constructing a triangular lattice between ground-classified points in the point cloud and then sampling the mesh. These maps are provided for convenience for users who prefer to work with raster data over point clouds or would otherwise prefer not to have to rasterize point cloud data themselves. These files can be opened and worked with using any GIS software; users preferring free and open source software could consider using QGIS [https://www.qgis.org/en/site/]. The CRS is NAD83 UTM Zone 16N.
Citation/License:
Please use our data! We ask that, if you do so and any published work results, you please cite the following manuscript as well as this zenodo dataset:
Martin, H.K., Edmonds, D.A., Yanites, B.J. & Niemi, N.A. (2024) Quantifying landscape change following catastrophic dam failures in Edenville and Sanford, Michigan, USA. Earth Surf. Process. Landforms, Available from: https://doi.org/10.1002/esp.5855.
These data are released under the CCBY-NC-SA 4.0 license [https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en].
In plain language, it means that you should feel free to download these data and use or modify them for whatever purpose you wish, as long as you i) attribute us as original authors of the dataset, ii) do not use them for commercial purposes, and iii) use the same license for any derivative works you create using these data.
For any questions or concerns, please don't hesitate to reach out to Harrison Martin at hkm@caltech.edu, or via https://harrison.studies.rocks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project was designed to estimate site occupancy of forest birds in Puerto Rico during a winter before (2015) and after (2018) passage of two significant tropical storms: Hurricane Irma and Hurricane Maria. This project includes two data files, one containing data on species occurrence gleaned from repeated point-counts and one containing data on characteristics of each point-count survey. A data dictionary is also included.The species-occurrence file (occurrenceData.csv) includes occurrence data for all species of forest birds encountered during surveys in 2015 and 2018 across the island of Puerto Rico. Data collected in 2018 represent a snapshot of conditions shortly after the passage of hurricanes Irma and Maria in September 2017. Data collected in 2015 represent a snapshot of baseline, pre-hurricane conditions. Surveys consisted of standardized 10-minute counts at each point, beginning shortly after sunrise. Each point count was divided into four 2.5-min intervals conducted in immediate succession, with 1-min playbacks of Bicknell’s Thrush vocalizations broadcast before the second and fourth periods. Individuals of all species detected in each count period were recorded in four distance bands (1-10 m, 10-25 m, 25-50 m, >50 m). No counting occurred during the two 1-min playback periods. The point-characteristics file (surveyPointsAllYears.csv) describes the location of each point and conditions at the time of each survey. Points were established using a randomized, spatially balanced network of survey locations using a generalized random tessellation stratified (GRTS) scheme. With this approach, we selected 60 1-km2 cells from across the island as potential areas in which to conduct surveys. Once we had drawn a sample of cells to survey, we visited each cell and identified 3-5 locations suitable for point-count surveys. Suitability was based on the extent of forest cover – at least 50% of the area in a 50-m radius around each point was forested – and accessibility; all points were along public roads or trails. To maintain independence of counts conducted at different points, we placed each point at least 250 m from its nearest neighbor. Point locations were spatially referenced via GPS and marked with metal tree tags. Due to time limitations – the original survey protocol called for surveys to end in March, prior to any potential pre-migration movement by Bicknell’s Thrush – we only sampled points in 43 of the 60 cells. In total, we sampled 186 points in the 43 cells in both 2015 and 2018. Twenty-five points were surveyed in 2015 but could not be accessed due to storm damage in 2018. To make up for some of these points, we surveyed 16 new points in 2018 within the cells that contained previously surveyed points that had become inaccessible. This yielded 211 points with data from 2015 and 202 points with data from 2018, for a total of 227 points with data from at least 1 year. Due to the original goal of the surveys, our sample was weighted towards areas more likely to contain habitat for Bicknell’s Thrush based on the winter-habitat model for that species and thus oversamples wet, high-elevation broadleaf forest. Although our survey locations were not a representative sample of all forest types on the island, the sample did include cells at lower elevations in drier forest types. Points surveyed in 2015 ranged in elevation from 0 - 1,297 m, with a median elevation of 705 m (interquartile range = 408 - 825 m). In 2018, surveyed points ranged in elevation from 5 - 1,297 m, with a median elevation of 760 m (interquartile range = 393 - 843 m). Thus, we believe our sampling frame can be described as predominantly forested areas on Puerto Rico that were accessible by roads or trails. The two important sources of potential bias in considering the scope of inference allowed by this sampling frame are 1) the oversampling of high-elevation forests, which will tend to yield low-precision estimates for bird assemblages characteristic of dry, lowland forest and 2) the reliance on trails and roads to access survey points, an unavoidable trade-off given the difficulty of moving through these tropical forests, especially following the hurricanes.
https://data.norge.no/nlod/en/2.0/https://data.norge.no/nlod/en/2.0/
Data from smiley face supervision provides an overall overview of all food establishments covered by the smiley face scheme (smile face regulations) and the audit results from the Norwegian Food Safety Authority’s supervision from 1.1. 2016 to date.
The data set “requirements for smiley surveillance at food establishments” contains each individual requirement point that is included in an audit, together with the grade given to the requirement point.Each audit will have a set of rows in this audit, linked by supervision time. The data set is related to Smilefjesyn at restaurants that contain the corresponding information as the poster that is hung at the eateries after they have been supervised. Description of the elements in the dataset for smileys
Requirements: The requirements of the regulations are set up as points.The requirements are directed at the supervisory objects and the Norwegian Food Safety Authority supervises the requirements. The requirements are a specific element of an inspection.These may be mandatory by ordinary supervision (see supervision).
Differences:* One requirement point that has been investigated, and where the Food Safety Authority has found breaches of the regulations that lead to notification or decision being made against the object of supervision. Theme: Organisation of requirements points that are used on the smiley poster to group requirements points.
Inspection:* The Norwegian Food Safety Authority’s audit process consists of an ordinary audit, with subsequent follow-up supervision until all nonconformities have been closed/remedied or the object of supervision shuts down the activity. When we carry out an ordinary audit, all claim points will be possible to assess, but it is not planned that all claim points will be considered at each ordinary audit. On follow-up supervision, only the requirements on which deviations were found are considered, unless there have been obvious nonconformities (or there are suspicions of nonconformities) at other claim points since the previous audit. Character scale: 0 = No breach of the rules found.Big smile.
1 = Less violations of the regulations that do not require follow-up. Big smile. 2 = Violation of the rules that require follow-up. Straight mouth. 3 = a serious breach of the law.Sour mouth. 4 = Not applicable — The enterprise does not have this activity at the audit object. Does not affect smiley character. 5 = Not considered — The Norwegian Food Safety Authority has not considered the requirement point for this audit. Does not affect smiley character. If there were any suspicions of significant or obvious nonconformities in connection with the inspection, the claim point would have been assessed. Character: The total smiley symbol after an audit corresponds to the poorest character given on the inspection. Character for each theme is the worst grade given to claim points under the theme. Each requirement point is given grades on the scale.
Invalidation of supervision: If the Norwegian Food Safety Authority does not comply with its obligations in the smiley face regulations, or an appealed decision is granted, the audit and the audit result will be withdrawn from open data. The information about this audit will no longer be considered correct. It is therefore important that users of the data set are aware of this so that they can keep their data up to date (1 time per day) in order to avoid the publication of incorrect information about an audit object.
Declaration of the contents of the data set Requirements: This data set contains each individual requirement point that is included in one audit, together with the grade given to the claim point.Each audit will have one set of rows in this audit, linked by supervision time.
“supervisor”; — Key for identifying one audit “date”;— The date the inspection has been carried out the “requirement point”; — Key to identifying one requirement point (not unique over time) “requirement point name”; — Name of the claim point “character”; — Grade (from scale) “text_no”; — Textual description of character, Bokmål “text_nn”;— Textual description of character, Nynorsk “supervisor_href”; — Link to the audit, so that claim points can be connected with the audit object2016 to date.
The data set “requirements for smiley surveillance at food establishments” contains each individual requirement point that is included in an audit, together with the grade given to the requirement point. Each audit will have a set of rows in this audit, linked by supervision time.The data set is related to Smilefjesyn at restaurants that contain the corresponding information as the poster that is hung at the eateries after they have been supervised.
Description of the elements in the dataset for smileys Requirements: The requirements of the regulations are set up as points. The requirements are directed at the supervisory objects and the Nor
Many animals rely on visual camouflage to avoid detection and increase their chances of survival. Edge disruption is commonly seen in the natural world, with animals evolving high-contrast markings that are incongruent with their real body outline to avoid recognition. Whilst many studies have investigated how camouflage properties influence viewer performance and eye movement in predation search tasks, researchers in the field have yet to consider how camouflage may directly modulate visual attention and object processing. To examine how disruptive colouration modulates attention, we use a visual object recognition model to quantify object saliency. We determine if object saliency is predictive of human behavioural performance and subjective certainty, as well as neural signatures of attention and decision-making. We show that increasing edge disruption not only reduces detection and identification performance but is also associated with a dampening of neurophysiological signatures of ..., Data was collected using an EEGbiosemi machine at the School of Psychology, University of Leeds, UK. All data were collected by Jac Billington. The data set contains both electrophysiological responses from the human brain and behavioural responses to the task. A total of twenty-one adult participants were recruited. Two participants were removed immediately following data collection; one could not complete the task as they could not see the camouflage stimuli and (due to an equipment failure) one did not have any recorded behavioural data. During EEG data pre-processing, one more participant was removed due to excessive noise evident in ERP signatures. ,
suggested SoftwareÂ
Â
 Link
MATLAB
Â
EEGlab 2019_1
Â
FieldTrip 20200310
Â
https://www.fieldtriptoolbox.org/
FASTER toolbox 1.2.3b
Â
https://sourceforge.net/projects/faster/
, # EEG responses to camouflage objects
Creator(s): Jac Billington (also known as Jaclyn)
Contact: j.billington@leeds.ac.uk
Organisation(s): University of Leeds
Publication Year: 2024
Description: This data set contains behavioural and EEG output data from one experiment.
Related publication: Intended to Proceedings of the Royal Society B August 2023.
21 .bdf files collected using an EEGbiosemi system at the School of Psychology, University of Leeds. Participants were given randomised codes at the point of data collection and were not identifiable. .bdf files can be imported into EEG analysis software such as EEGlab*.
Participant codes [bgty, bvcd, cder, cocq, cxza, erfd, jabi, mnbg, nbvf, nhyu, qwsa, rtgf, tyhg, uikj, vcxs, vfrt, weds, xpbc, xswe, yujh, and zaqw]
Participants “parts_*†have been split into .7z folders in sets of three for space-saving purposes:
Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise. The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data. The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to: 1) unit number 2) time, in cycles 3) operational setting 1 4) operational setting 2 5) operational setting 3 6) sensor measurement 1 7) sensor measurement 2 ... 26) sensor measurement 26 Data Set: FD001 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: ONE (HPC Degradation) Data Set: FD002 Train trjectories: 260 Test trajectories: 259 Conditions: SIX Fault Modes: ONE (HPC Degradation) Data Set: FD003 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: TWO (HPC Degradation, Fan Degradation) Data Set: FD004 Train trjectories: 248 Test trajectories: 249 Conditions: SIX Fault Modes: TWO (HPC Degradation, Fan Degradation) Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the outbreak of the global public health crisis in 2019, enhancing the resilience of regional economies has become the current focal point. Existing studies have mostly focused on the region itself, lacking exploration of regional economic resilience from the aspects of dynamics, multiple perspectives, and multidimensional integration. At the same time, the digital industry, as an emerging sector, should not only consider its impact on economic development itself, but also focus on whether it can continuously and effectively enhance the level of regional economic resilience, in order to cope with crises that may arise at any time. Therefore, through empirical methods, we conducted a detailed study of the spatial correlation and internal driving factors between the digital industry and regional economic resilience, aiming to build a more valuable theoretical framework based on existing research findings and explore a regional resilience development strategy centered around the digital industry. This paper, combining conclusions and methods from existing literature, attempts to expand the definition of regional economic resilience, evaluation index system, and the relationship with the digital industry from the perspective of evolutionary economic geography. This article empirically examines data from 30 provinces in China from 2014 to 2022 (excluding Tibet, Hong Kong, Macau, and Taiwan due to lack of data). Firstly, this paper employs a two-way fixed effects model to examine the direct relationship between digital industry development and regional economic resilience. The research results indicate that the development of the digital industry can effectively enhance regional economic resilience. Secondly, the role of spatial location, as an important aspect of evolutionary economic geography, is also considered in this paper. The spatial Durbin model is used to discover spatial spillover effects of digital industry development on regional economic resilience under different spatial location relationships. Finally, this paper considers environmental regulations as a threshold variable to study the impact of the digital industry on regional economic resilience under different levels of environmental regulation. The results indicate that when the degree of environmental regulation is less than 0.0011, the digital industry can more effectively empower the enhancement of regional economic resilience levels. In conclusion, this paper finds that while emphasizing the role of the digital industry in the resilient development of regional economies, it is also essential to promote regional cooperation for mutual benefit and win-win results. This will accelerate the transformation of digital enterprises, optimize industrial structures, and achieve green development.
Someone is rational in their thinking to the extent that they follow a rational procedure when determining what to believe. So whether someone is rational cannot be determined so much by whether they hold true or false beliefs (outcome-based rationality), but by how they arrived at these beliefs (procedure-based rationality). In this study, we want to answer the question to what extent 4-5-year-old children, 6-7-year-old children, and adults from China and the United States consider the procedure and the outcome in evaluating the rationality of an agent?
In a picture book story, participants will be introduced to two characters whose pet ran away. They are trying to find the pet by using either rational (e.g. looking for the pet's traces) or irrational (e.g. using a spinning wheel) procedures that lead them to either the right (pointing at the location where the pet is hiding) or the wrong conclusion (pointing at the location where the pet is not hiding).
More precisely, the participants will see three conditions:
In an outcome matters condition, both characters are using an irrational procedure to find out where their pet is hiding, but one chooses the correct location, the other the wrong location.
In a process matters condition, one of the characters is using a rational and the other is using an irrational procedure, while both choose wrong locations.
In a process vs. outcome condition, one character is using an irrational procedure and point to the right location, the other character is using a rational procedure and point to the wrong location.
This file provides an overview of the site locations where profiles (or point data) in WoSIS are located. Depending on the type of survey, one or more profiles can occur within a site in accordance with ISO 28528 soil domain conventions. WoSIS_latest is a 'dynamic dataset' that contains the most recent complement of quality-assessed and standardised soil data served from WoSIS (ISRIC World Soil Information Service). Being dynamic, this dataset will grow/change once new point data are acquired and standardised, additional soil properties are considered, and/or when possible amendments are required. Static snapshots of wosis_latest are released at irregular intervals for consistent citation purposes, see https://doi.org/10.5194/essd-16-4735-2024 and WoSIS FAQ-page (https://www.isric.org/explore/wosis/faq-wosis).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The purpose of the data product switching points is to indicate places where there is a connection between different transport networks and/or modes of transport. The exchange points will also constitute a national reference and will be developed in cooperation between the Swedish Transport Administration, the Swedish Civil Aviation Administration and the Swedish Maritime Administration. The dataset is to be considered as test data. Both completeness and timeliness are lacking. A TransferNode is a description of a place where switching between different means of transport or modes of transport is possible. A switching point contains at least one stoppoint (StopPoint) where each stoppoint belongs to only one mode of transport. Stop points are the places where means of transport can stop and where persons can get on/off and/or goods can be loaded on/off. If there could be common interchange points for Air/Sea/Road/Rail in one location, only one interchange point is created according to the following hierarchical order: 1. Airport (LFV) 2. Port (SjöV) 3. Railway (TRV) 4. Road (TRV) Complemented with the data product stoppoint, it is possible to describe the connection between different transport networks and/or modes of transport for a switching point. The TransportTypeValueOwner attribute indicates the mode of transport that is the ‘main owner’ of the switching point according to the hierarchy above. The TransportModeCode attribute indicates the type of mode of transport served by the switching point. The data product switching points is not directly connected to a traffic network. Through the data product stoppoints, a switching point can get a connection to a traffic network. The dataset is provided by order to the Swedish Transport Administration: geographic.information@trafikverket.se.
The resource is a practical worksheet that can guide the integration of eye-tracking capabilities into visualization or visual analytic systems by helping identify opportunities, challenges, and benefits of doing so. The resource also includes guidance for its use and three concrete examples. Importantly, this resource is meant to be used in conjunction with the design framework and references detailed in section 4 of: ���Gaze-Aware Visualization: Design Considerations and Research Agenda��� by R. Jianu, N. Silva, N. Rodrigues, T. Blascheck, T. Schreck, and D. Weiskopf (in Transactions on Visualization and Computer Graphics). The worksheet encourages designers who wish to integrate eye-tracking into visualization or visual analytics systems to carefully consider 18 fundamental facets that can inform the integration process and whether it is likely to be valuable. Broadly, these relate to: M1-M3: Measurable data afforded by eye trackers (and other modalities and context data that could be used together with such data) I1-I6: Inferences that can be made from measured data about users��� interests, tasks, intent, and analysis process S1-S7: Opportunities to use such inferences to support visual search, interaction, exploration, analysis, recall, collaboration, and onboarding B1-B9: Limitations to beware that arise from eye-tracking technology and the sometimes inscrutable ways in which human perception and cognition work, and which may constrain support possibilities. To apply the worksheet to inform the design of a gaze-aware visualization or visual analytic system one would: Progress through its sections and consider the facets they contain step-by-step. For each facet: Refer to the academic paper mentioned above (in particular section 4) for a more detailed discussion about the facet and for supporting references that provide further depth, inspiration, and concrete examples Consider carefully how these details apply to the specific visualization under analysis and its context of use. Consider both opportunities that eye-tracking affords (M, I, S) but also limitations and challenges (B) Use the specific questions under each facet (e.g., ���Are lighting conditions too variable for accurate gaze tracking?��� ) to further guide the thought process and capture rough yes/no assessments (if this is possible) Summarize a design rationale at the end of each worksheet section. This should capture design decisions or options and the motivation behind them, as informed by thought processes and insights facilitated by the design considerations in the section. The format and level of detail of such summaries are up to the designer (a few different options are shown in our examples). We exemplify this use of the worksheet by conjecturing how eye-tracking could be integrated in three visualizations systems (included in the resource). We chose three systems that span a broad range of domains and contexts to exemplify different challenges and opportunities. We also exemplify different ways of capturing design rationales ��� more detailed/verbose or as bullet points.
https://hub.arcgis.com/api/v2/datasets/8c2eaec7f2e14ac682c9317e9de47a26/licensehttps://hub.arcgis.com/api/v2/datasets/8c2eaec7f2e14ac682c9317e9de47a26/license
(Link to Metadata) VTHYDRODEM was created to produce a "hydrologically correct" DEM, compliant with the Vermont Hydrography Dataset (VHD) in support of the "flow regime" project whose goal it is to derive stream perenniality for the VHD through application of logistic regression techniques. Some very important notes about the data: 1)Produced specifically for hydrologic modeling purposes and elevation surface has been altered and should not be used for analyses requiring unmodified elevation values; 2) ELEVATION VALUES, i.e., "Z units", are in CENTIMETERS (details below); and 3) Source data spans a five year period where varying techniques were used. This may explain observed inconsistencies both between and within tiles (detailed in the Attribute Accuracy Report below). This dataset has elevation values present in the surface that accurately reflect the down gradient nature and location of surface water features, i.e. the VHD. This process is also known as "hydro-enforcement" or "drainage enforcement". It is largely unknown that the 1:24k scale National Elevation Dataset (NED) is not "hydrologically correct" in relation to the National Hydrography Dataset (NHD) vector data of the same scale, e.g., the flow paths in the NED surface are not perfectly coincident to those in the scale NHD surface water features. This fact precluded the use of the NED data for hydrologic modeling efforts and reaffirmed the need to create a new "hydrologically correct" DEM. All processing was done using ARCINFO workstation (v.8.3) commands. The ARCINFO "TOPOGRID" command was used to create VTHYDRODEM as it was specifically designed to create "hydrologically correct" digital elevation models (DEM's) from elevation, stream and lake data sets. Single line "1D" streams and lake/pond "2D" polygons, from the 1:5k scale VHD, were given priority over input elevation data in the interpolation process to ensure that the resulting data is "hydrologically correct". Both the VHD and VTHYDRODEM share a common base of the state digital orthophotos, ensuring their interoperability. The Triangulated Irregular Networks (TIN) method was not considered but interested readers should review West Virginia's approach http://www.wvgis.wvu.edu/stateactivities/wvsamb/elevation/topogrid_vs_tin.pdf. This report notes the advantages and disadvantages of each approach. It should be noted that the WV effort included more recent imagery, and a much tighter sampling interval of source data. Nonetheless, it makes a strong case for the TIN approach that should be considered in any subsequent DEM development efforts. The density of input points used to create VTHYDRODEM was lower than the 1:24k NED but the vertical accuracy of those points tested at a higher accuracy and these points were generated with less variability in technique than that of the NED (see http://gisdata.usgs.net/website/USGS_GN_NED_DSI/viewer.htm and check "production methods" under "Layers" for NED data sources and methods). Vertical accuracy was derived using the FGDC National Standards for Spatial Data Accuracy (NSSDA) standards. For the sake of comparison, VTHYDRODEM tested at 6.05 meters, vertical accuracy at the 95% confidence level, whereas, the 1:24k National Elevation Dataset (DEM_24) tested at 21.3 meters. VTHYDRODEM was created for a specific, in-house project to support hydrologic modeling activities using the 1:5k scale VHD. It was interpolated from: 1) the Vermont Mapping Program (VMP) "x, y, z" data known as the "DEM points" (originally used to georectify the state digital orthophotos); and 2) VHD surface water features. A 10-meter cell resolution was chosen for VTHYDRODEM as a balance between input data accuracy and practical considerations and does not necessarily reflect the accuracy of the input data. The 10-meter resolution of this dataset was chosen arbitrarily for reasons noted below and should not be confused with an accuracy of 10 meters. This data should not be confused with the "1/3 arc second" 10m NED data. The lower 10m cell resolution has the following advantages when compared to the existing 30m 1:24k NED: 1) Stream confluences (junctions) can be defined with a greater degree of precision; 2) Confluences in close proximity can be represented individually; 3) Smaller landscape features can be represented and larger ones in greater detail; 4) Exponential improvement in volumetric measurement and tripling of precision in linear measurement of derived vector features, e.g., a watershed boundary is composed of aggregated 10m, i.e., 3 cells equals 30m vs. 30m resolution where 3 cells equals 90m. Similarly the concept applies to volumetric measurements); and 5) Improved cartographic accuracy for derived vector features. NOTE! Elevation units, e.g., "Z units" are in CENTIMETERS. This seeming arbitrary decision has a number of advantages worth considering. The output grid can now be stored as an "integer" type grid while simultaneously preserving the precision of the input data to the nearest centimeter. Integer type grids require one-tenth the storage space and are consequently much faster to process, e.g., deriving watershed boundaries. While it is unlikely that the input data is accurate to the nearest centimeter, this approach allows for greater precision storage, improves the overall appearance of the DEM and precludes problems with the model's depiction of over land flow in hydrologic related analyses when compared to coarser vertical resolutions. This approach mirrors a trend among the USGS and its contractors, who are now producing DEM's with a vertical resolution of decimeters (0.1 meter) for the benefits outlined above.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard
This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.
Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.
These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.
This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The vapor–liquid critical properties of mixtures, which represent the end points of vapor–liquid equilibrium curves, are crucial for the development of next-generation environmentally friendly working fluids and advancements in supercritical fluid technology. Experimental measurement and theoretical prediction are the main means to obtain the critical properties. However, the existing theoretical prediction methods have the problems of low accuracy, especially when predicting critical volumes of binary mixtures and critical properties of ternary mixtures. Moreover, in most prediction methods, all the data are used to fit the adjustable parameters. No prediction data set was added to test its prediction ability. In this work, new prediction models for critical properties, including critical temperature, critical pressure, and critical volume of binary mixtures and ternary mixtures, were proposed. New methods extend our previous work (Tang et al.’s model) to the prediction of critical volumes and critical properties of ternary mixtures and accurately evaluate the extrapolation ability. New prediction models inherit many advantages of Tang et al.’s model, including considering the effect of molecular polarity to some extent, possessing four fewer adjustable parameters, a simpler form in use, and not requiring the critical volume data of pure substances when predicting critical temperatures and critical pressures of mixtures. Moreover, three new methods possess fast prediction abilities for critical properties of mixtures, showing higher accuracy in predicting critical properties of binary and ternary mixtures in the fitting data set and prediction data set. When predicting critical temperatures and critical pressures, new models can be applied to binary and ternary mixtures consisting of methane-free alkanes, alkenes, alkynes, alicyclic hydrocarbons, benzene and its derivatives, NH3, CO2, halogenated hydrocarbon, N2O, Kr, Xe, sulfur-compounds, and oxygen-containing organic compounds. Notably, new methods are applicable only to class I (continuous curve of the critical point of two pure components) of the vapor–liquid critical locus, classified by Van Konynenburg and Scott. Class I covers many systems and most of the available critical property experimental data. Almost all systems can be covered in predicting critical volumes. About 9000 critical data points including critical temperatures, critical pressures, and critical volumes for binary mixtures and ternary mixtures are collected. About 80% of the data for binary mixtures were used to fit the empirical parameters. The remaining data were used to test the prediction ability for binary and ternary mixtures for three new prediction methods. The prediction model 1 shows the highest correlation and prediction accuracy among the three prediction models. The average absolute relative deviations are 1.13, 3.91, and 5.90 when correlating critical temperatures, critical pressures, and critical volumes; 1.43, 4.94, and 6.93 when predicting critical temperatures, critical pressures, and critical volumes of binary mixtures; and 1.12 and 3.99 when predicting critical temperature and critical pressures of ternary mixtures.
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Measured according to the Bray-I method, a combination of HCl and NH4 F to remove easily acid soluble P forms, largely Al- and Fe-phosphates (for acid soils) (mg/kg).
WoSIS_latest is a 'dynamic dataset' that contains the most recent complement of quality-assessed and standardised soil data served from WoSIS (ISRIC World Soil Information Service). The source data were shared by a wide range of data providers (see: https://www.isric.org/explore/wosis/wosis-contributing-institutions-and-experts).
Being dynamic, the contents of 'wosis_latest' will change once new point data are acquired, cleansed and standardised, additional soil properties are considered, and/or when possible amendments are required.
Static snapshots of 'wosis_latest' are released at irregular intervals for consistent citation purposes and to discuss methodological changes; the last snapshot is available at https://doi.org/10.5194/essd-2024-14.
For general information about WoSIS please see the FAQ-page at https://www.isric.org/explore/wosis/faq-wosis.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This feature class represents the points that were used for the Route Directness Index calculation. These points were generated along the increasing state routes as represented by the 12/31/2023 WSDOT LRS. Points were only generated within Population Centers (using the Population Centers dataset obtained from the GIS Workbench in January 2025). Every 0.01 mile (52.8 feet) along the route a transect was created perpendicular to the route extending 500 feet on each side of the route. These points represent the midpoint of each transect. The Route Directness Index (RDI) is a ratio that compares the straight-line (crow-flies) distance from one point to another across a barrier to the actual distance imposed by the network of paths available to a traveler. RDI data is particularly relevant to pedestrian and/or bicyclist trips due to the extra time, physical energy, and exposure to weather that out of direction travel creates. A complete discussion of route directness, including potential applications to decision making, can be found Washington State Multimodal Permeability Pilot, August 2021 [https://wsdot.wa.gov/sites/default/files/2021-11/MultimodalPermeabilityPilotReport-Aug2021.pdf].RDI can be analyzed at different scales. A high-level analysis of RDI can address questions that compare population centers across the state or consider whether the RDI values are generally similar within a given population center or tend to vary in different portions of a population center. High level data could be combined with other statewide data such as crash data, transit stops, level of traffic stress data, destination data, etc. to analyze potential correlations. High level RDI data is less useful for analyzing a particular crossing location or recommending solutions to address high RDI values. A more detailed analysis is likely required when questions involve corridor studies or project evaluations. Detailed location information can refer to key destinations and crossing locations that are not captured using higher level network maps.The lowest RDI is 1 because a trip between those points can be made directly along an existing roadway. The actual methodology analyzed hypothetical trips where the start and end points were about a quarter mile apart relative to a straight line. In such a situation, an RDI of 2 would mean the trip is twice the distance it might otherwise be, or about one-half mile. Although one-half mile is not particularly far, the RDI is independent of the actual distance. We might start further down the road and if the RDI remained a 2 our trip distance would be twice as long as it could have been. The RDI thus measures the real or perceived burden or travel cost incurred by a person walking or bicycling.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
The FIO program encompasses a wide range of interactions between the Boston Police Department (BPD) and private individuals. By releasing the records of these interactions, BPD hopes to add transparency to the execution of the program while still protecting the privacy of the individuals involved. These records are now sourced from three different record management systems titled: (OLD RMS) (NEW RMS) and (MARK43). The differences between the resulting files are described below.
These records are compiled from the BPD’s new Records Management System (RMS) on the BPD's FIO program. MARK43 went live September 29, 2019 and the FIO information has been structured into two separate tables. These tables are the same titles as (NEW RMS) but include new or different data points as retrieved from MARK43.
FieldContact
, which lists each contact between BPD and one or more individualsFieldContact_Name
, which lists each individual involved in these contacts.A FIO Data Key has also been created and posted to help distinguish the data categories (Data Key (Mark43)).
Lastly, FIOs are maintained in a live database and information related to each individual may change overtime. The data provided here should be considered a static representation of the Field Interaction and/or Observation that occurred in 2019.
NULL indicates no entry was made for an optional field.
These records are compiled from the BPD’s new Records Management System (RMS) on the BPD's FIO program. The new RMS, which went live in June, 2015, structures the FIO information into two separate tables:
FieldContact
, which lists each contact between BPD and one or more individualsFieldContact_Name
, which lists each individual involved in these contactsWhile these two tables align on the field contact number (fc_num
) column, it is not methodologically correct to join the two datasets for the purpose of generating aggregate statistics on columns from the FieldContact
table. Doing so would lead to incorrect estimates stemming from contacts with multiple individuals. As noted in the Data Key (New RMS) file, several of the columns in the FieldContact
table apply to the contact as a whole, but may not necessarily apply to each individual involved in the contact. These include:
frisked
searchperson
summonsissued
circumstances
basis
contact_reason
For example, the frisked
column contains a value of Y
if any of the individuals involved in a contact were frisked, but it would be inaccurate to assume that all individuals were frisked during that contact. As such, extrapolating from the frisked
column for a contact to each individual and then summing across them would give an artificially high estimate of the number of people frisked in total. Likewise, the summonsissued
column indicates when someone involved in a contact was issued a summons, but this does not imply that everyone involved in a contact was issued a summons.
For a detailed listing of columns in each table, see both tables of the Data Key (New RMS) file below.
These records are sourced from BPD's older RMS, which was retired in June, 2015. This system (which stored all records in a single table, rather than the two tables in the newer system) captures similar information to the new RMS, but users should note that the fields are not identical and exercise care when comparing or combining records from each system.
For more information on the FIO Program, please visit:
Boston Police Commissioner Announces Field Interrogation and Observation (FIO) Study Results
Boston Police Department Releases Latest Field Interrogation Observation Data
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The North American river otter (Lontra canadensis) is a semi-aquatic furbearer species that historically ranged throughout North America. Starting in the mid-1800s and continuing through the early 1900s, the negative effects associated with anthropogenic disturbances (i.e. overharvest, development and ultimately habitat alternation) led to local extinctions. Researchers debate whether current land use patterns are affecting river otter occupancy. New Jersey is the most densely populated state in the United States, thus it provides a perfect study area to test potential anthropogenic effects on river otters. Using occupancy modeling to examine river otter habitat preferences, we measured presence/absence at 244 low order streams from January-April 2011–2012 along with 19 corresponding site/landscape covariates in both Northern and Southern New Jersey. In Southern New Jersey, we detected otters at 83/141 sites (58.9%) with a detection probability of 97.7% across repeat visits and a predicted occupancy of 59.4 ± 0.04%. In Northern New Jersey we detected otters at 31/103 sites (30.1%) with a detection probability of 44.5% across repeat visits and a predicted occupancy of 58.8 ± 0.04%. We determined the influence of habitat covariates on otter occupancy and found that water depth, water quality, stream width and mink presence were positively correlated with otter occupancy. The % Commercial, Industrial, Transportation and Recreational habitat, % low intensity development, bank slope, and distance to lake were negatively correlated with otter occupancy. Knowing the location of occupied stream and latrine sites will assist biologists in their efforts to monitor river otter populations and help estimate river otter density for harvest and conservation efforts. Methods Site Selection Potential sites were restricted to lower order streams that could be safely waded and/or had accessible stream banks that could be surveyed by foot. To identify a stratified distribution of random stream sites, we looked at streams that were close in proximity to sites classified as either Ambient Biomonitoring Network (AMNET, NJDEP 2011) or Index of Biotic Integrity (IBI)/Fish Index or Biotic Integrity (FIBI) sites (NJDEP 2012a) because the New Jersey Department of Environmental Protection (NJDEP) assessed the overall water quality and stream health which would allow us to use corresponding water quality, habitat quality, and fish density data as covariates for our occupancy models. Once the sites were selected, we selected transects that would include a road/bridge crossing of the stream (Roberts and Crimmins 2008, Crimmins et. al 2009) to ensure that we would be surveying areas with some level of human traffic. If some AMNET, FIBI, or otter harvest/sightings points were not on a road/bridge crossing, we moved our stream site to the nearest road/bridge crossing on the stream. We had a total of 244 unique sites surveyed throughout the two-year study (124 in 2011 and 120 in 2012) that were comprised of 147 AMNET, 24 IBI, 25 FIBI, 42 otter harvest/sightings, and 6 random stream sites with 141 located in Southern New Jersey and 103 located in Northern New Jersey. Field Protocol To sample the presence of otters, we conducted 6 repeat walking transect surveys 1 January - 30 April 2011 and 2012 every 14 ± 4 d at each survey site. The purpose of these 4 days of flexibility was twofold:1) to allow for the potential of missing survey days due to inclement weather such as snow, ice, and heavy rain and 2) to provide the opportunity for surveyors to be proactive and survey certain sites in advance in anticipation of extended periods of inclement weather. Additionally, depending on the physical characteristics of the sites, surveys were not conducted ≤ 2 d after heavy rain or snow/ice storms to ensure the safety of surveyors and because scat and track signs could be washed away or hidden under snow. We conducted 600 m long stream bank walking sign surveys, usually 300 m on each side of a bridge crossing (Roberts et al. 2008). If there were any impediments on a particular side of the crossing, such as a lake, extremely dense vegetation, or deep water, we walked an additional distance on the other side to make up the difference. We took GPS coordinates at each site and recorded the start/end time for each stream survey. For each replicate visit, we recorded current, overnight, and previous day’s temperature, the presence or absence of precipitation (snow/ice/rain) during the survey, overnight, and the previous day and the presence or absence of snow or ice cover at the sites at the time of the survey. Snow cover can both negatively and positively impact the ability of the surveyors to locate animal signs. We also recorded average site-level habitat characteristics including water depth and bankfull height (i.e. the water level, or stage, at which a stream, river or lake is at the top of its banks and any further rise would result in water moving into the flood plain), which were taken randomly within the transect at each replicate visit while stream width, bank height, and bank slope were only taken once per site. While visiting each site, we made note of live otter sightings and the presence of otter tracks, scat, latrines, and slides. We also identified and recorded the substrate where that specific otter sign was found. GPS coordinates were taken for otter latrines and otter slides so that the NJDFW could monitor these locations in the future. Water Quality Assessment of Sites Water quality and habitat quality assessments were based on previously gathered and readily available AMNET (NJDEP 2011; NJDEP GIS 2013) and FIBI (NJDEP 2012a) data. Because not all survey points were AMNET or FIBI points (i.e. random points and otter harvest/sighting points mentioned above, N = 48), not all points had this background data that we could use. Therefore, we created the following guidelines to provide estimated water and habitat quality values for all survey points:
For those points that were AMNET or FIBI, we used the corresponding water and habitat quality assessment scores/ratings for those sites.
2. For those points that were characterized as either random or otter harvest/sighting, we chose the closest AMNET or FIBI point to that survey point, with preference given to AMNET or FIBI points that were on the same stream as the survey point.
We also had to consider the year in which the AMNET and FIBI points were surveyed because these assessments are not done annually at every site making some variation regarding the year in which the water quality or habitat quality assessments are made. While AMNET/FIBI samples are taken every year, it takes 5 years to reach all sites. While assigning water and habitat quality values to a survey site, we chose the most recent available values within the five year intervals. Each survey site, was given a water quality rating of excellent, good, fair, or poor (via AMNET or FIBI rating). Further, survey sites were given habitat quality ratings (and corresponding scores) of optimal (160-200), suboptimal (110-159), marginal (60-109), and poor (<60) via the AMNET and FIBI scoring based on individual condition of 10 habitat parameters (NJDEP 2007). Data Analysis: Occupancy and Predictive Habitat Modeling We arbitrarily defined the “survey site” as the area within 600 m of the 600 m stream transect (i.e. 600 m upstream and 600 meters downstream totaling 1800 m in length). We quantified the AMNET/FIBI 10 site level habitat configuration and quality variables including: stream width, water depth, bank height, bankfull height, bank slope, distance to the closest lake/pond, AMNET/FIBI water quality, AMNET/FIBI habitat quality, as well as beaver and mink presence we observed in our transects. For landscape level covariates, we reclassified the 84 land use/land cover types of the New Jersey Land Use Land Cover (NJLULC) dataset to into 8 summary habitat types defined by NJLUC including: 1) High Intensity Development; 2) Low Intensity Development; 3) Commercial, Industrial, Transportation and Recreational (CITR); 4) Agriculture; 5) Upland Natural; 6) Fresh Water; 7) Non-coastal Wetlands; and 8) Other. To quantify habitat composition, we measured the proportion of each habitat type (excluding Other) within the buffered transect area. For each covariate measured at the landscape scale, we determined the buffer radius around the site at which each covariate was most strongly correlated to otter presence among a set of buffer radii ranging from 0.6 km to 16.2 km (being within the range of home range estimates of the river otter) at 600 m increments for all the sites in each sampling region (Holland et al. 2004, Duren et al. 2011). We used bootstrapping to obtain Spearman’s Rank Correlation Coefficients on 10,000 random samples of 10 sites that were 25.2 km apart in Southern New Jersey and 9 sites that were 22.8 km apart in Northern New Jersey. We estimated site occupancy and detection probability using the modeling approach of Mackenzie et al. (2002), which accounts for the probability of an individual occupying the site and being detected during a survey. We used Akaike’s Information Criterion (AIC) to evaluate and select models (Burnham and Anderson 2002) and performed analyses using the program PRESENCE (Hines 2006). First, we modeled detection probability among survey points considering four explanatory covariates: month, observer, snow cover, presence of precipitation, and additionally constant detection probability to overcome or avoid variation in detection probability, which was considered to be a nuisance parameter. We selected a best model from that analysis to control for detection probability for subsequent modeling of otter occupancy. We modeled otter occupancy for Northern and Southern New Jersey separately using logistic regression with the covariates of site and landscape scale metrics. Candidate
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trust and Believe – Should We? Evaluating the Trustworthiness of Twitter Users
This model is used to analyze the Twitter users and assigns a score calculated based on their social profiles, the credibility of his tweets, the h-indexing score of the tweets. Users with a higher score are not only considered as more influential but also their tweets are considered to have greater credibility. The model is based on both the user level and content level features of a Twitter user. The details for feature extraction and calculating the Influence score is given in the paper.
Description
To extract the features from Twitter and generate the dataset we used Python. A modAL framework is used to randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. We generate a dataset for 50000 Twitter users and then used different classifiers to classify the Twitter user either as Trusted or Untrusted.
Organization
The project consists of the following files:
Dataset.csv
The dataset consists of different features of 50000 Twitter users (Politicians) without labels.
Manually_labeled-Dataset.csv
This CSV file contains all those Twitter users classified manually as Trusted or Untrusted
feature_extraction.py
This python script is used to calculate the Influence score of a Twitter user and further used to generate a dataset. The Influence score is based on:
- Social reputation of the user
- Content score of the tweets
- Tweets credibility
- Index score for the number of re-tweets and likes
Activelearner.ipynb
To classify a large pool of unlabeled data, we used an active learning model (ModAL Framework). A semi-supervised learning algorithm ideal for a situation in which the unlabeled data is abundant but manual labeling is expensive. The active learner randomly selects ambiguous data points from the unlabeled data pool using three different sampling techniques and the human manually annotates the selected data. Further, we use four different classifiers (Support Vector Machine, Logistic Regression, Multilayer Perceptron and Random Forest) to classify the Twitter user as either Trusted Or Untrusted.
twitter_reputation.ipynb
We used different regression models to test its performance on our generated dataset (It is only for testing, now no more part of our work). We train and evaluate our models using different regression models.
Training and testing three regression models:
1. Multilayer perceptron
2. Deep neural network
3. Linear regression
twitter_credentials.py
In order to extract the features of Twitter users first, one need to authenticate by providing the credentials given in this file.
Screen names (Screen_name_1.txt, Screen_name_2.txt, Screen_name_3.txt)
These text files consist of all the Twitter user screen_names. All of them are politicians. We remove the names of all those politicians whose accounts are private. In addition, all those politicians who have no followers/followings are not on the list are also removed. The text of the tweets are not saved. Furthermore, we also remove duplicate names.
References
[1] https://stackoverflow.com/questions/38881314/twitter-data-to-csv-getting-error-when-trying-to-add-to-csv-file
[3] https://gallery.azure.ai/Notebook/Computing-Influence-Score-for-Twitter-Users-1
[4] https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
[5] https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
There has been persistent interest in the mediating role of the micronutrient selenium (Se) in mercury (Hg) toxicity since the 1960s. Despite many unresolved questions regarding Se–Hg interactions, considerable research has been performed to document Se:Hg molar ratios in aquatic animals as a basis for inferring the health risks associated with their consumption. We compiled the co-reported Se and Hg data for 386 shellfish, finfish, and aquatic mammals from 89 papers to assess differences in consumption safety categorizations according to health risk metrics that (a) consider Hg concentrations alone or (b) co-consider Se and Hg concentrations. Species-specific mean Hg concentrations for 23% of all data points in our database exceeded the FAO/WHO maximum level, and 68% and 83% had estimated daily intake per meal values for adults and children above the Joint FAO/WHO Expert Committee on Food Additives Provisional Tolerable Daily Index. In contrast, only 12% of data points would be categorized as unsafe for consumption on the basis of their 1:1 Se:Hg molar ratios. Se-inclusive risk metrics reversed safety categorizations for a majority of high-trophic level species that are most likely to exceed Hg-based risk thresholds. Adopting Se-inclusive risk metrics has potentially significant implications for seafood consumer health and is likely premature.