Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Many capture-recapture surveys of wildlife populations operate in continuous time but detections are typically aggregated into occasions for analysis, even when exact detection times are available. This discards information and introduces subjectivity, in the form of decisions about occasion definition. We develop a spatio-temporal Poisson process model for spatially explicit capture-recapture (SECR) surveys that operate continuously and record exact detection times. We show that, except in some special cases (including the case in which detection probability does not change within occasion), temporally aggregated data do not provide sufficient statistics for density and related parameters, and that when detection probability is constant over time our continuous-time (CT) model is equivalent to an existing model based on detection frequencies. We use the model to estimate jaguar density from a camera-trap survey and conduct a simulation study to investigate the properties of a CT estimator and discrete-occasion estimators with various levels of temporal aggregation. This includes investigation of the effect on the estimators of spatio-temporal correlation induced by animal movement. The CT estimator is found to be unbiased and more precise than discrete-occasion estimators based on binary capture data (rather than detection frequencies) when there is no spatio-temporal correlation. It is also found to be only slightly biased when there is correlation induced by animal movement, and to be more robust to inadequate detector spacing, while discrete-occasion estimators with binary data can be sensitive to occasion length, particularly in the presence of inadequate detector spacing. Our model includes as a special case a discrete-occasion estimator based on detection frequencies, and at the same time lays a foundation for the development of more sophisticated CT models and estimators. It allows modelling within-occasion changes in detectability, readily accommodates variation in detector effort, removes subjectivity associated with user-defined occasions, and fully utilises CT data. We identify a need for developing CT methods that incorporate spatio-temporal dependence in detections and see potential for CT models being combined with telemetry-based animal movement models to provide a richer inference framework.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a new method, called analysis-of-marginal-tail-means (ATM), for effective robust optimization of discrete black-box problems. ATM has important applications in many real-world engineering problems (e.g., manufacturing optimization, product design, and molecular engineering), where the objective to optimize is black-box and expensive, and the design space is inherently discrete. One weakness of existing methods is that they are not robust: these methods perform well under certain assumptions, but yield poor results when such assumptions (which are difficult to verify in black-box problems) are violated. ATM addresses this by combining both rank- and model-based optimization, via the use of marginal tail means. The trade-off between rank- and model-based optimization is tuned by first identifying important main effects and interactions from data, then finding a good compromise which best exploits additive structure. ATM provides improved robust optimization over existing methods, particularly in problems with (i) a large number of factors, (ii) unordered factors, or (iii) experimental noise. We demonstrate the effectiveness of ATM in simulations and in two real-world engineering problems: the first on robust parameter design of a circular piston, and the second on product family design of a thermistor network.
Facebook
TwitterConfocal quadrics constitute a special example of orthogonal coordinate systems. In this cumulative thesis we propose two approaches to the discretization of confocal coordinates, and study the closely related checkerboard incircular nets. First, we propose a discretization based on factorizable solutions to an integrable discretization of the Euler-Poisson-Darboux equation. The constructed solutions are discrete Koenigs nets and feature a novel discrete orthogonality constraint defined on pairs of dual discrete nets, as well as a corresponding discrete isothermicity condition. The coordinate functions of these discrete confocal coordinates are explicitly given in terms of gamma functions. Secondly, we show that classical confocal coordinates and their reparametrizations along coordinate lines are characterized by orthogonality and the factorization property. We use these two properties to propose another discretization of confocal coordinates, while again employing the aforementioned discrete orthogonality constraint. In comparison to the first approach, this definition results in a broader class of nets capturing arbitrary reparametrizations also in the discrete case. We show that these discrete confocal coordinate systems may equivalently be constructed geometrically via polarity with respect to a sequence of classical confocal quadrics. Different sequences correspond to different discrete parametrizations. We give several explicit examples, including parametrizations in terms of Jacobi elliptic functions. A particular example of discrete confocal coordinates in the two-dimensional case is closely related to incircular nets, that is, congruences of straight lines in the plane with the combinatorics of the square grid such that each elementary quadrilateral admits an incircle. Thus, thirdly, we classify and integrate the class of checkerboard incircular nets, which constitute the Laguerre geometric generalization of incircular nets. Further aspects of the novel discrete orthogonality constraint are studied in the introduction of this thesis. These include discrete Lamé coefficients, discrete focal nets, discrete parallel nets, and discrete isothermicity, as well as the relation to pairs of circular and conical nets.
Facebook
TwitterABSTRACT In this paper, the sensitivity of the continuous particle size distribution function (PSD) was evaluated, as flocculation occurs. To this end, six discrete configurations were investigated, subdivided into 3, 4, 5, 6, 7, 8, 10 and 15 classes. The distribution studied here differ in the particles presented by size range and behavior, whether monotonic or unimodal. The adjustment of the continuous function in linearized form to the discrete data was evaluated by means of R2. The power law coefficient (β) and the center of mass (C.M) were taken as representative of PSD during the sensitivity analysis, which was carried out by means of multiple correlation between the variables. Batch essays were performed in order to evaluate simulations. Results showed that β and C.M were both sensitive to PSD variations; however, the function adjustments for unimodal distributions need improvement.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern data analysis typically involves the fitting of a statistical model to data, which includes estimating the model parameters and their precision (standard errors) and testing hypotheses based on the parameter estimates. Linear mixed models (LMMs) fitted through likelihood methods have been the foundation for data analysis for well over a quarter of a century. These models allow the researcher to simultaneously consider fixed (e.g., treatment) and random (e.g., block and location) effects on the response variables and account for the correlation of observations, when it is assumed that the response variable has a normal distribution. Analysis of variance (ANOVA), which was developed about a century ago, can be considered a special case of the use of an LMM. A wide diversity of experimental and treatment designs, as well as correlations of the response variable, can be handled using these types of models. Many response variables are not normally distributed, of course, such as discrete variables that may or may not be expressed as a percentage (e.g., counts of insects or diseased plants) and continuous variables with asymmetrical distributions (e.g., survival time). As expansions of LMMs, generalized linear mixed models (GLMMs) can be used to analyze the data arising from several non-normal statistical distributions, including the discrete binomial, Poisson, and negative binomial, as well as the continuous gamma and beta. A GLMM allows the data analyst to better match the model to the data rather than to force the data to match a specific model. The increase in computer memory and processing speed, together with the development of user-friendly software and the progress in statistical theory and methodology, has made it practical for non-statisticians to use GLMMs since the late 2000s. The switch from LMMs to GLMMs is deceptive, however, as there are several major issues that must be thought about or judged when using a GLMM, which are mostly resolved for routine analyses with LMMs. These include the consideration of conditional versus marginal distributions and means, overdispersion (for discrete data), the model-fitting method [e.g., maximum likelihood (integral approximation), restricted pseudo-likelihood, and quasi-likelihood], and the choice of link function to relate the mean to the fixed and random effects. The issues are explained conceptually with different model formulations and subsequently with an example involving the percentage of diseased plants in a field study with wheat, as well as with simulated data, starting with a LMM and transitioning to a GLMM. A brief synopsis of the published GLMM-based analyses in the plant agricultural literature is presented to give readers a sense of the range of applications of this approach to data analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Previous research on emotional language relied heavily on off-the-shelf sentiment dictionaries that focus on negative and positive tone. These dictionaries are often tailored to non-political domains and use bag-of-words approaches which come with a series of disadvantages. This paper creates, validates, and compares the performance of (1) a novel emotional dictionary specifically for political text, (2) locally trained word embedding models combined with simple neural-network classifiers and (3) transformer-based models which overcome limitations of the dictionary approach. All tools can measure emotional appeals associated with eight discrete emotions. The different approaches are validated on different sets of crowd-coded sentences. Encouragingly, the results highlight the strengths of novel transformer-based models, which come with easily available pre-trained language models. Furthermore, all customized approaches outperform widely used off-the-shelf dictionaries in measuring emotional language in German political discourse. This replication directory contains code and data necessary to reproduce all models, figures, and tables included in "Creating and Comparing Dictionary, Word Embedding, and Transformer-based Models to Measure Discrete Emotions in German Political Text" as well as its supplemental online appendix.
Facebook
TwitterThis digital GIS dataset and accompanying nonspatial files synthesize the model outputs from a regional-scale volumetric 3-D geologic model that portrays the generalized subsurface geology of western South Dakota from a wide variety of input data sources.The study area includes all of western South Dakota from west of the Missouri River to the Black Hills uplift and Wyoming border. The model data released here consist of the stratigraphic contact elevation of major Phanerozoic sedimentary units that broadly define the geometry of the subsurface, the elevation of Tertiary intrusive and Precambrian basement rocks, and point data representing the three-dimensional geometry of fault surfaces. the presence of folds and unconformities are implied by the 3D geometry of the stratigraphic units, but these are not included as discrete features in this data release. The 3D geologic model was constructed from a wide variety of publicly available surface and subsurface geologic data; none of these input data are part of this Data Release, but data sources are thoroughly documented such that a user could obtain these data from other sources if desired. This model was created as part of the U.S. Geological Survey’s (USGS) National Geologic Synthesis (NGS) project—a part of the National Cooperative Geologic Mapping Program (NCGMP). The WSouthDakota3D geodatabase contains twenty-five (25) subsurface horizons in raster format that represent the tops of modeled subsurface units, and a feature dataset “GeologicModel”. The GeologicModel feature dataset contains a feature class of thirty-five (35) faults served in elevation grid format (FaultPoints). The feature class “ModelBoundary” describes the footprint of the geologic model, and was included to meet the NCGMP’s GeMS data schema. Nonspatial tables define the data sources used (DataSources), define terms used in the dataset (Glossary), and provide a description of the modeled surfaces (DescriptionOfModelUnits). Separate file folders contain the vector data in shapefile format, the raster data in ASCII format, and the nonspatial tables as comma-separated values. In addition, a tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and the accompanying nonspatial tables (EntityAndAttributes). An included READ_ME file documents the process of manipulating and interpreting publicly available surface and subsurface geologic data to create the model. It additionally contains critical information about model units, and uncertainty regarding their ability to predict true ground conditions. Accompanying this data release is the “WSouthDakotaInputSummaryTable.csv”, which tabulates the global settings for each fault block, the stratigraphic horizons modeled in each fault block, the types and quantity of data inputs for each stratigraphic horizon, and then the settings associated with each data input.
Facebook
TwitterIn Costa Rica, the production of vital statistics is the responsibility of the Unidad de Estadísticas Demográficas (UED) which belongs to the Área de Censos y Encuestas of the Instituto Nacional de Estadística y Censos. .
Vital statistics arise from the processing of information obtained from certificates of vital events (births, deaths and marriages) whose registration is in charge of the Civil Registry of Costa Rica. For this purpose, a tripartite form has been established, where the original copy is from the Civil Registry, the first copy for the INEC and the second copy for the mother of the newborn. As of mid-2016, the digital declaration of birth began, however, the physical record is still maintained.
As of 2016, births begin to be registered online, for which the information is entered into the Unit through a database which is coupled at the end of the processing of physical certificates, which have not disappeared from the all.
Specifically, the birth statistics reflect the frequency and intensity with which births occur throughout the country. In addition, it is possible to know the sociodemographic profile of the mother and father, as well as the birth data.
All these variables allow, among other aspects, to create indicators such as the global fertility rate, the crude birth rate, as well as to make population estimates. These data are used both nationally and internationally.
It should be noted that because there is a level of lag in birth statistics in the country, operationally all cases that occurred in the last 10 years are included, including the calendar year that is being worked on. In this way, for example, for the year 2017, all those births that occurred in the period 2008-2017 but that were registered in the year 2017 are included, and so on for the other years.
This lag, it has been verified that it is compensated at the national level year after year, in this way, it is expected that what was not registered in 2017, will enter in 2018 and this data will be approximately equal to what is stopped enrolling in 2016 and enrolled in 2017 and so on for all years. Late registrations are published respecting the year in which the birth occurred.
| Variable | Description | Type |
|---|---|---|
| Anotrab | Year of work | Discrete |
| Mestrab | Work month | Discrete |
| Nacio | Type of birth | Discrete |
| Sexo | Sex of the newborn | Discrete |
| Peso | Birth weight in grams | Discrete |
| pesorec | Weight at birth in groups (grams) | Discrete |
| Estatura | Height at birth in centimeters | Discrete |
| estrec | Height at birth in groups (centimeters) | Discrete |
| Provocu | Province of occurrence | Discrete |
| Pcocu | Canton of occurrence | Discrete |
| Pcdocu | District of occurrence | Discrete |
| Instnac | Institution where the birth occurred | Discrete |
| Dianac | Birthday | Discrete |
| Mesnac | Birth month | Discrete |
| Anonac | Year of birth | Discrete |
| Leyp | Responsible parenthood law | Discrete |
| Edadpad | Father's age | Discrete |
| edpadrec | Father's age in groups | Discrete |
| Paispad | Father's country of origin | Discrete |
| Nacpad | Father's nationality | Discrete |
| grocupad | Father's occupation groups | Discrete |
| Nivedpad | Father's educational level | Discrete |
| Hijtepad | Children by the father | Discrete |
| Escivpad | Father's marital status | Discrete |
| Edadmad | Mother's age | Discrete |
| edmadrec | Mother's age in groups | Discrete |
| Paismad | Mother's country of origin | Discrete |
| Nacmad | Mother's nationality | Discrete |
| grocumad | Mother's occupation groups | Discrete |
| Nivedmad | Mother's educational level | Discrete |
| Escivmad | Mother's marital status | Discrete |
| Provincia | Mother's province of residence | Discrete |
| Pc | Canton of residence of the mother | Discrete |
| Pcd | Mother's district of residence | Discrete |
| IU | Urbanity index | Discrete |
| Reginec | Regionalization of mideplan | Discrete |
| Regsalud | Regionalization ministry of health | Discrete |
| Paratend | Person who attended the birth | Discrete |
| Mesesemb | Months of pregnancy | Discrete |
| Hijosten | Children born by the mother | Discrete |
| Abortos | Total abortions | Discrete |
| Totconsul | Total queries | Discrete |
| Medcons | Consultations by doctor | Discrete |
| Declara | Person declaring birth | Discrete |
| Provregis | Province of registration | Discrete |
| Pcregis | Registration canton | Discrete |
| Pcdregis | Registration district | Discrete |
| Diadeclara | Day the declaration is made | Discrete |
| Mesdeclara | Month in which the declaration is made | Discrete |
| Anodeclara | Year the declaration is made | Discrete |
| Filiacion | Filiation | Discrete |
| Inscen | Place where the statement is made | Discrete |
Birth: is the expulsion or complete extraction from the mother's body of a product of conception, (regardless of the duration of the pregnancy), which after such separation, breathes or manifests any other sign of life, such as heartbeat, umbilical cord pulsations, or voluntary effective movement of muscles, wheth...
Facebook
TwitterThis digital GIS dataset and accompanying nonspatial files synthesize model outputs from a regional-scale volumetric 3-D geologic model that portrays the generalized subsurface geology of the Powder River Basin and Williston Basin regions from a wide variety of input data sources. The study area includes the Hartville Uplift, Laramie Range, Bighorn Mountains, Powder River Basin, and Williston Basin. The model data released here consist of the stratigraphic contact elevation of major Phanerozoic sedimentary units that broadly define the geometry of the subsurface, the elevation of Tertiary intrusive and Precambrian basement rocks, and point data that illustrate an estimation of the three-dimensional geometry of fault surfaces. The presence of folds and unconformities are implied by the 3D geometry of the stratigraphic units, but these are not included as discrete features in this data release. The 3D geologic model was constructed from a wide variety of publicly available surface and subsurface geologic data; none of these input data are part of this Data Release, but data sources are thoroughly documented such that a user could obtain these data from other sources if desired. The PowderRiverWilliston3D geodatabase contains 40 subsurface horizons in raster format that represent the tops of modeled subsurface units, and a feature dataset “GeologicModel”. The GeologicModel feature dataset contains a feature class of 30 estimated faults served in elevation grid format (FaultPoints), a feature class illustrating the spatial extent of 22 fault blocks (FaultBlockFootprints), and a feature class containing a polygon delineating the study areas (ModelBoundary). Nonspatial tables define the data sources used (DataSources), define terms used in the dataset (Glossary), and provide a description of the modeled surfaces (DescriptionOfModelUnits). Separate file folders contain the vector data in shapefile format, the raster data in ASCII format, and the tables as comma-separated values. In addition, a tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and the accompanying nonspatial tables (EntityAndAttributes). An included READ_ME file documents the process of manipulating and interpreting publicly available surface and subsurface geologic data to create the model. It additionally contains critical information about model units, and uncertainty regarding their ability to predict true ground conditions. Accompanying this data release is the “PowderRiverWillistonInputSummaryTable.csv”, which tabulates the global settings for each fault block, the stratigraphic horizons modeled in each fault block, the types and quantity of data inputs for each stratigraphic horizon, and then the settings associated with each data input.
Facebook
TwitterDiscrete jet forcing is a method to delay or suppress flow separation that is associated with a decrease in lift as well as an increase in drag. Although this method is used in both wind tunnel testing and flight experiments, the underlying mechanisms and scaling laws are not fully understood. Understanding these laws enables an optimized application of active flow control (AFC) to increase the efficacy, while reducing the input requirements. Only if the benefits of separation control significantly outweigh the associated costs does an application on an airplane become feasible. Both steady and sweeping jets are employed in the current work to investigate the effect of the sweeping motion on separation control authority. The associated flow fields are investigated for a variety of input parameters and actuator spacings. All actuator designs are tested on the NASA hump geometry that provides a platform with fixed boundary conditions for the comparison of various actuator designs. It is found that steady jets are able to effectively control the flow only at small spacings. Due to their favorable energy requirements, separation control at small spacings is the preferred application for steady jets. Fluidic oscillators are able to control the flow at both small and large spacings. Various actuator designs are tested to investigate the effect of the sweeping angle on the control authority. The results indicate that actuator designs with large sweeping angles are more suitable for controlling the flow at larger spacings, significantly outperforming steady jets, which yield a smaller jet spreading angle. The underlying mechanism for the superior performance of fluidic oscillators is an increased entrainment of high momentum fluid to wall-near regions with counter-rotating vortex pairs (CRVP) created along the span. The coherence in space and time of these CRVP is found to correlate with the control authority. If fluidic oscillators are tightly spaced, the flow field is less organized and no CRVP are formed. Here, the fluidic oscillators do not operate to their full potential and the additional energy requirements due to their internal feedback-mechanism may make them an inferior choice compared to steady jets. In the present work, a scaling law for freestream Reynolds number and actuator size is suggested. A properly defined momentum coefficient governs both the scaling of freestream Reynolds number and actuator size. To properly define this momentum coefficient, the throat conditions either have to be measured directly or accurately determined from measurements at the plenum. This scaling law allows for accurate scaling of an AFC design and its associated performance in the wind tunnel to flight conditions. This means, if the momentum coefficient is maintained between wind tunnel experiments and flight tests, the same performance is to be expected, excluding potential freestream Mach number effects. A quantitative relationship between actuator spacing and performance is yet to be determined. The present work provides a guideline for future work by suggesting circulation coefficients that quantify the vorticity and its organization introduced by discrete jet forcing at various spacings. The circulation data allow to distinguish boundary layer control and circulation control and reveal that the vorticity introduction in the boundary layer control region is a function of mass flow rate per jet and independent of spacing. Furthermore, the optimal spacing of a fluidic oscillator design can be determined by quantifying its flow field organization.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This synthetic dataset was generated from Monte Carlo simulations of lightning flashovers on medium voltage (MV) distribution lines. It is suitable for training machine learning models for classifying lightning flashovers on distribution lines. The dataset is hierarchical in nature (see below for more information) and class imbalanced.
Following five different types of lightning interaction with the MV distribution line have been simulated: (1) direct strike to phase conductor (when there is no shield wire present on the line), (2) direct strike to phase conductor with shield wire(s) present on the line (i.e. shielding failure), (3) direct strike to shield wire with backflashover event, (4) indirect near-by lightning strike to ground where shield wire is not present, and (5) indirect near-by lightning strike to ground where shield wire is present on the line. Last two types of lightning interactions induce overvoltage on the phase conductors by radiating EM fields from the strike channel that are coupled to the line conductors. Three different methods of indirect strike analysis have been implemented, as follows: Rusck's model, Chowdhuri-Gross model and Liew-Mar model. Shield wire(s) provide shielding effects to direct, as well as screening effects to indirect, lightning strikes.
Dataset consists of two independent distribution lines, with heights of 12 m and 15 m, each with a flat configuration of phase conductors. Twin shield wires, if present, are 1.5 m above the phase conductors and 3 m apart [2]. CFO level of the 12 m distribution line is 150 kV and that of the 15 m distribution line is 160 kV. Dataset consists of 10,000 simulations for each of the distribution lines.
Dataset contains following variables (features):
'dist': perpendicular distance of the lightning strike location from the distribution line axis (m), generated from the Uniform distribution [0, 500] m,
'ampl': lightning current amplitude of the strike (kA), generated from the Log-Normal distribution (see IEC 60071 for additional information),
'front': lightning current wave-front time (us), generated from the Log-Normal distribution; it needs to be emphasized that amplitudes (ampl) and wave-front times (front), as random variables, have been generated from the appropriate bivariate probability distribution which includes statistical correlation between these variates,
'veloc': velocity of the lightning return-stroke current defined indirectly through the parameter "w" that is generated from the Uniform distribution [50, 500] m/us, which is then used for computing the velocity from the following relation: v = c/sqrt(1+w/I), where "c" is the speed of light in free space (300 m/us) and "I" is the lightning-current amplitude,
'shield': binary indicator that signals presence or absence of the shield wire(s) on the line (0/1), generated from the Bernoulli distribution with a 50% probability,
'Ri': average value of the impulse impedance of the tower's grounding (Ohm), generated from the Normal distribution (clipped at zero on the left side) with median value of 50 Ohm and standard deviation of 12.5 Ohm; it should be mentioned that the impulse impedance is often much larger than the associated grounding resistance value, which is why a rather high value of 50 Ohm have been used here,
'EGM': electrogeometric model used for analyzing striking distances of the distribution line's tower; following options are available: 'Wagner', 'Young', 'AW', 'BW', 'Love', and 'Anderson', where 'AW' stands for Armstrong & Whitehead, while 'BW' means Brown & Whitehead model; statistical distribution of EGM models follows a user-defined discrete categorical distribution with respective probabilities: p = [0.1, 0.2, 0.1, 0.1, 0.3, 0.2],
'ind': indirect stroke model used for analyzing near-by indirect lightning strikes; following options were implemented: 'rusk' for the Rusck's model, 'chow' for the Chowdhuri-Gross model (with Jakubowski modification) and 'liew' for the Liew-Mar model; statistical distribution of these three models follows a user-defined discrete categorical distribution with respective probabilities: p = [0.6, 0.2, 0.2],
'CFO': critical flashover voltage level of the distribution line's insulation (kV),
'height': height of the phase conductors of the distribution line (m),
'flash': binary indicator that signals if the flashover has been recorded (1) or not (0). This variable is the outcome/label (i.e. binary class).
Mathematical background used for the analysis of lightning interaction with the MV distribution line can be found in the references cited below.
References:
A. R. Hileman, "Insulation Coordination for Power Systems", CRC Press, Boca Raton, FL, 1999.
J. A. Martinez and F. Gonzalez-Molina, "Statistical evaluation of lightning overvoltages on overhead distribution lines using neural networks," in IEEE Transactions on Power Delivery, vol. 20, no. 3, pp. 2219-2226, July 2005.
A. Borghetti, C. A. Nucci and M. Paolone, An Improved Procedure for the Assessment of Overhead Line Indirect Lightning Performance and Its Comparison with the IEEE Std. 1410 Method, IEEE Transactions on Power Delivery, Vol. 22, No. 1, 2007, pp. 684-692.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[Version 1.2] This version of the dataset fixes a bug found in the previous versions (see below for more information).
Dataset has been generated from the Monte Carlo simulations of lightning flashovers on medium voltage (MV) distribution lines. It is suitable for training machine learning models for classifying lightning flashovers on distribution lines, as well as for line insulation coordination studies. The dataset is hierarchical in nature (see below for more information) and class imbalanced.
Following five different types of lightning interaction with the MV distribution line have been simulated: (1) direct strike to phase conductor (when there is no shield wire present on the line), (2) direct strike to phase conductor with shield wire(s) present on the line (i.e. shielding failure), (3) direct strike to shield wire with backflashover event, (4) indirect near-by lightning strike to ground where shield wire is not present, and (5) indirect near-by lightning strike to ground where shield wire is present on the line. Last two types of lightning interactions induce overvoltage on the phase conductors by radiating EM fields from the strike channel that are coupled to the line conductors. Shield wire(s) provide shielding effects to direct, as well as screening effects to indirect, lightning strikes.
Dataset consists of the following variables:
'dist': perpendicular distance of the lightning strike location from the distribution line axis (m), generated from the Uniform distribution [0, 500] m,
'ampl': lightning current amplitude of the strike (kA), generated from the Log-Normal distribution (see IEC 60071 for additional information),
'veloc': velocity of the lightning return stroke current (m/us), generated from the Uniform distribution [50, 500] m/us,
'shield': binary indicator that signals presence or absence of the shield wire(s) on the line (0/1), generated from the Bernoulli distribution with a 50% probability,
'Ri': average value of the impulse impedance of the tower's grounding (Ohm), generated from the Normal distribution (clipped at zero on the left side) with median value of 50 Ohm and standard deviation of 12.5 Ohm; it should be mentioned that the impulse impedance is often much larger than the associated grounding resistance value, which is why a rather high value of 50 Ohm have been used here,
'EGM': electrogeometric model used for analyzing striking distances of the distribution line's tower; following options are available: 'Wagner', 'Young', 'AW', 'BW', 'Love', and 'Anderson', where 'AW' stands for Armstrong & Whitehead, while 'BW' means Brown & Whitehead model; statistical distribution of EGM models follows a user-defined discrete categorical distribution with respective probabilities: p = [0.1, 0.2, 0.1, 0.1, 0.3, 0.2],
'CFO': critical flashover voltage level of the distribution line's insulation (kV); following three levels have been used: 150, 150, and 160 kV, respectively, for three different distribution lines of height 10, 12, and 14 m,
'height': height of the phase conductors of the distribution line (m); distribution line has flat configuration of phase conductors with following heights: 10, 12, and 14 m; twin shield wires, if present, are 1.5 m above the phase conductors and 3 m apart; data set consists of 10000 simulations for each line height,
'flash': binary indicator that signals if the flashover has been recorded (1) or not (0). This variable is the outcome (binary class).
Note: It should be mentioned that the critical flashover voltage (CFO) level of the line is taken at 150 kV for the first two lines (10 m and 12 m) and 160 kV for the third line (14 m), and that the diameters of the phase conductors and shield wires for all treated lines are, respectively, 10 mm and 5 mm. Also, average grounding resistance of the shield wire is assumed at 10 Ohm for all treated cases (it has no discernible influence on the flashover rate). Dataset is class imbalanced and consists in total of 30000 simulations, with 10000 simulations for each of the three different MV distribution line heights (geometry) and CFO levels.
Important: Version 1.2 of the dataset fixes an important bug found in the previous data sets, where the column 'Ri' contained duplicate data from the column 'veloc'. This issue is now resolved.
Mathematical background used for the analysis of lightning interaction with the MV distribution line can be found in the references below.
References:
J. A. Martinez and F. Gonzalez-Molina, "Statistical evaluation of lightning overvoltages on overhead distribution lines using neural networks," in IEEE Transactions on Power Delivery, vol. 20, no. 3, pp. 2219-2226, July 2005, doi: 10.1109/TPWRD.2005.848734.
A. R. Hileman, "Insulation Coordination for Power Systems", CRC Press, Boca Raton, FL, 1999.
Facebook
Twitterhttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1bhttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1b
The dataset contains information on European groundwater bodies, monitoring sites, river basin districts, river basin districts sub-units and surface bodies reported to the European Environment Agency between 2001-11-29 and 2019-02-12.
This data set is available only for internal use of the European Environment Agency and may contain objects that are deprecated (i.e. that have been retired or superseded) and objects were marked confidential by data providers. Please search for "PUBLIC VERSION" in the dataset title to access the publicly available version.
The information was reported to the European Environment Agency under the State of Environment reporting obligations. For the EU28 countries and Norway, the EIONET spatial data was consolidated with the spatial data reported under the Water Framework Directive reporting obligations. For these countries, the reference spatial data set is the "WISE WFD Reference Spatial Datasets reported under Water Framework Directive".
Relevant concepts:
Groundwater body: 'Body of groundwater' means a distinct volume of groundwater within an aquifer or aquifers.
Groundwater: All water which is below the surface of the ground in the saturation zone and in direct contact with the ground or subsoil. Aquifer: Subsurface layer or layers of rock or other geological strata of sufficient porosity and permeability to allow either a significant flow of groundwater or the abstraction of significant quantities of groundwater. Surface water body: Body of surface water means a discrete and significant element of surface water such as a lake, a reservoir, a stream, river or canal, part of a stream, river or canal, a transitional water or a stretch of coastal water. Surface water: Inland waters, except groundwater; transitional waters and coastal waters, except in respect of chemical status for which it shall also include territorial waters. Inland water: All standing or flowing water on the surface of the land, and all groundwater on the landward side of the baseline from which the breadth of territorial waters is measured. River: Body of inland water flowing for the most part on the surface of the land but which may flow underground for part of its course. Lake: Body of standing inland surface water. River basin district: The area of land and sea, made up of one or more neighbouring river basins together with their associated groundwaters and coastal waters, which is the main unit for management of river basins. River basin: The area of land from which all surface run-off flows through a sequence of streams, rivers and, possibly, lakes into the sea at a single river mouth, estuary or delta. Sub-basin: The area of land from which all surface run-off flows through a series of streams, rivers and, possibly, lakes to a particular point in a water course (normally a lake or a river confluence). Sub-unit [Operational definition. Not in the WFD]: Reporting unit. River basin districts larger than 50000 square kilometre should be divided into comparable sub-units with an area between 5000 and 50000 square kilometre. The sub-units should be created using river basins (if more than one river basin exists in the RBD), set of contiguous river basins, or sub-basins, for example. If the RBD area is less than 50000 square kilometre, the RBD itself should be used as a sub-unit.
Facebook
TwitterAccessibility Observatory data reflects the number of jobs that are reachable by various modes within different travel times from different Census-defined geographies in Massachusetts (block, block group, tract). The data comes from the Accessibility Observatory at the University of Minnesota, and the underlying jobs data is sourced from the U.S. Census Bureau’s Local Employer Household Dynamics (LEHD) dataset. More information about data methodology is available here: http://access.umn.edu/publications/·The data posted on GeoDOT is initially organized by mode: Auto, Transit, Pedestrian, and Bike. With respect to Auto, Transit, and Pedestrian data, data is then organized by geography (group and block group), and then travel time threshold: 30, 45, and 60 minutes. Please note that MassDOT has access to data that reflects travel time thresholds in five minute increments, email Derek Krevat at derek.krevat@dot.state.ma.us for more information.Data reflecting access to jobs via Auto is available for each hour of the day at the different travel time thresholds (30, 45 and 60 minute thresholds are posted; five minute thresholds are available by contacting Derek Krevat at derek.krevat@dot.state.ma.us). For convenience, MassDOT has also created stand-alone summary files that reflect the total number of jobs available throughout the day within 30, 45, and 60 minutes of travel time. See the Data Dictionary, Auto All Jobs for more information.· Pedestrian and Transit data is only available for the morning peak travel period, 7:00 to 9:00 am.· Bicycle data is only available for the noontime hour.· Each of the data files contains data reflecting access to all jobs as well as discrete job opportunities as categorized by the U.S. Census bureau, such as jobs in specific industries, with specific types of workers, with specific wages, or in businesses of certain sizes or ages. See the Data Dictionary for more information.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
HadUK-Grid is a collection of gridded climate variables derived from the network of UK land surface observations. The data have been interpolated from meteorological station data onto a uniform grid to provide complete and consistent coverage across the UK. These data at 1 km resolution have been averaged across a set of discrete geographies defining UK administrative regions consistent with data from UKCP18 climate projections. The dataset spans the period from 1836 to 2021 but the start time is dependent on climate variable and temporal resolution.
The gridded data are produced for daily, monthly, seasonal and annual timescales, as well as long term averages for a set of climatological reference periods. Variables include air temperature (maximum, minimum and mean), precipitation, sunshine, mean sea level pressure, wind speed, relative humidity, vapour pressure, days of snow lying, and days of ground frost.
This data set supersedes the previous versions of this dataset which also superseded UKCP09 gridded observations. Subsequent versions may be released in due course and will follow the version numbering as outlined by Hollis et al. (2018, see linked documentation).
The changes for v1.1.0.0 HadUK-Grid datasets are as follows:
The addition of data for calendar year 2021
The addition of 30 year averages for the new reference period 1991-2020
An update to 30 year averages for 1961-1990 and 1981-2010. This is an order of operation change. In this version 30 year averages have been calculated from the underlying monthly/seasonal/annual grids (grid-then-average) in previous version they were grids of interpolated station average (average-then-grid). This order of operation change results in small differences to the values, but provides improved consistency with the monthly/seasonal/annual series grids. However this order of operation change means that 1961-1990 averages are not included for sfcWind or snowlying variables due to the start date for these variables being 1969 and 1971 respectively.
A substantial new collection of monthly rainfall data have been added for the period before 1960. These data originate from the rainfall rescue project (Hawkins et al. 2022) and this source now accounts for 84% of pre-1960 monthly rainfall data, and the monthly rainfall series has been extended back to 1836.
Net changes to the input station data used to generate this dataset:
-Total of 122664065 observations
-118464870 (96.5%) unchanged
-4821 (0.004%) modified for this version
-4194374 (3.4%) added in this version
-5887 (0.005%) deleted from this version
The primary purpose of these data are to facilitate monitoring of UK climate and research into climate change, impacts and adaptation. The datasets have been created by the Met Office with financial support from the Department for Business, Energy and Industrial Strategy (BEIS) and Department for Environment, Food and Rural Affairs (DEFRA) in order to support the Public Weather Service Customer Group (PWSCG), the Hadley Centre Climate Programme, and the UK Climate Projections (UKCP18) project. The output from a number of data recovery activities relating to 19th and early 20th Century data have been used in the creation of this dataset, these activities were supported by: the Met Office Hadley Centre Climate Programme; the Natural Environment Research Council project "Analysis of historic drought and water scarcity in the UK"; the UK Research & Innovation (UKRI) Strategic Priorities Fund UK Climate Resilience programme; The UK Natural Environment Research Council (NERC) Public Engagement programme; the National Centre for Atmospheric Science; National Centre for Atmospheric Science and the NERC GloSAT project; and the contribution of many thousands of public volunteers. The dataset is provided under Open Government Licence.
Facebook
TwitterIn Costa Rica, the production of vital statistics is the responsibility of the Unidad de Estadísticas Demográficas (UED) which, until 2015, belonged to the Área de Estadísticas Continuas of the Instituto Nacional de Estadística y Censos (INEC). As of 2016, the Unit becomes part of the Área de Censos y Encuestas.
Vital statistics arise from the processing of information obtained from certificates of vital events (birth, death and marriage) whose registration is in charge of the Civil Registry of Costa Rica.
Specifically, death statistics reflect the frequency, and intensity, with which deaths occur in the country. In addition, it is possible to know the socioeconomic profile of the deceased person and the basic cause of death. All these variables allow, among other aspects, to create indicators to measure the effectiveness of health and epidemiological programs and, in this way, detect the needs for medical services and resources for decision-making and formulation of public policies that are essential for the country.
It should be noted that due to the fact that there is a level of lag in death statistics in the country, operationally all cases that occurred in the last 10 years are included, including the calendar year that is being worked on. In this way, for example, for the year 2014, all those deaths that occurred in the period 2005-2014 are included, but that were registered in the year 2014. Thus, for 2015, the deaths that occurred in the period 2006 are included -2015 and for 2016, those that occurred between 2007-2016.
This lag has been found to be compensated at the national level year after year. In this way, what was not registered in 2014 is expected to enter in 2015 and this data will be approximately equal to what was not registered in 2013 and was registered in 2014 and so on for every year. Late registrations are published respecting the year in which the birth occurred.
| Variable | Description | Type |
|---|---|---|
| anotrab | Year of work | Discrete |
| mestrab | Work month | Discrete |
| nacionalid | Nationality | Discrete |
| sexo | Sex | Discrete |
| estcivil | Last marital status | Discrete |
| edads | Simple age in completed years | Discrete |
| edadsrec | Age in completed years in five-year groups | Discrete |
| provincia | Province of residence | Discrete |
| pc | District of residence | Discrete |
| IU | Urbanization index | Discrete |
| causamuer | Basic cause of death code | Discrete |
| des_causa | Literal description of underlying cause of death | Discrete |
| autopsia | Whether an autopsy was done or not | Discrete |
| asistmed | Medical assistance | Discrete |
| instmurio | Institution where death | Discrete |
| provocu | Province of occurrence | Discrete |
| pcocu | District of occurrence | Discrete |
| diadef | Day of death | Discrete |
| mesdef | Month of death | Discrete |
| anodef | Year of death | Discrete |
| ocuparec | Last occupation in groups | Discrete |
| nacmadre | Nationality of the mother (For children under 1 year of age) | Discrete |
| provregis | Province of registration | Discrete |
| pcregis | Registration district | Discrete |
| diadeclara | Declaration day | Discrete |
| mesdeclara | Declaration month | Discrete |
| anodeclara | Declaration year | Discrete |
| grgruposcb | Grouping to 17 | Discrete |
| gruposcb | Grouping to 63 | Discrete |
| reginec | Regionalization of the INEC | Discrete |
| regsalud | Regionalization of the Ministry of Health | Discrete |
Death: it is the permanent disappearance of all kinds of signs of life, regardless of the time elapsed since birth. Therefore this definition excludes fetal deaths.
Child death: all those deaths of boys and girls that occurred between the moment of birth and before reaching one year of life. Infant deaths are classified as neonatal and postneonatal.
Maternal death: defined as the death of a woman while she is pregnant, or within 42 days after the termination of the pregnancy; regardless of the duration and site of the pregnancy, due to any cause related to, or aggravated by, the pregnancy itself or its care but not due to accidental or incidental causes.
Neonatal death: refers to deaths that occurred in the first 28 days of life, which is the period of greatest risk.
Post-neonatal death: refers to deaths that occur after 29 days of life and before reaching one year of age.
General mortality rate: it is the number of deaths per thousand inhabitants; that is, the rati...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unpaired two-tailed t-test and Mann-Whitney U-tests were used to test for mean differences in response frequencies.*Two-sample test of proportions were used to explore differences in response frequencies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In daily life, two common algorithms are used for collecting medical disease data: data integration of medical institutions and questionnaires. However, these statistical methods require collecting data from the entire research area, which consumes a significant amount of manpower and material resources. Additionally, data integration is difficult and poses privacy protection challenges, resulting in a large number of missing data in the dataset. The presence of incomplete data significantly reduces the quality of the published data, hindering the timely analysis of data and the generation of reliable knowledge by epidemiologists, public health authorities, and researchers. Consequently, this affects the downstream tasks that rely on this data. To address the issue of discrete missing data in cardiac disease, this paper proposes the AGAN (Attribute Generative Adversarial Nets) architecture for missing data filling, based on generative adversarial networks. This algorithm takes advantage of the strong learning ability of generative adversarial networks. Given the ambiguous meaning of filling data in other network structures, the attribute matrix is designed to directly convert it into the corresponding data type, making the actual meaning of the filling data more evident. Furthermore, the distribution deviation between the generated data and the real data is integrated into the loss function of the generative adversarial networks, improving their training stability and ensuring consistency between the generated data and the real data distribution. This approach establishes the missing data filling mechanism based on the generative adversarial networks, which ensures the rationality of the data distribution while filling the missing data samples. The experimental results demonstrate that compared to other filling algorithms, the data matrix filled by the proposed algorithm in this paper has more evident practical significance, fewer errors, and higher accuracy in downstream classification prediction.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RMSE value of missing data filling algorithms on different datasets (mean ± std).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Although South Africa is the global epicenter of the HIV epidemic, the uptake of HIV testing and treatment among young people remains low. Concerns about confidentiality impede the utilization of HIV prevention services, which signals the need for discrete HIV prevention measures that leverage youth-friendly platforms. This paper describes the process of developing a youth-friendly internet-enabled HIV risk calculator in collaboration with young people, including young key populations aged between 18 and 24 years old. Using qualitative research, we conducted an exploratory study with 40 young people including young key population (lesbian, gay, bisexual, transgender (LGBT) individuals, men who have sex with men (MSM), and female sex workers). Eligible participants were young people aged between 18–24 years old and living in Soweto. Data was collected through two peer group discussions with young people aged 18–24 years, a once-off group discussion with the [Name of clinic removed for confidentiality] adolescent community advisory board members and once off face-to-face in-depth interviews with young key population groups: LGBT individuals, MSM, and female sex workers. LGBT individuals are identified as key populations because they face increased vulnerability to HIV/AIDS and other health risks due to societal stigma, discrimination, and obstacles in accessing healthcare and support services. The measures used to collect data included a socio-demographic questionnaire, a questionnaire on mobile phone usage, an HIV and STI risk assessment questionnaire, and a semi-structured interview guide. Framework analysis was used to analyse qualitative data through a qualitative data analysis software called NVivo. Descriptive statistics were summarized using SPSS for participant socio-demographics and mobile phone usage. Of the 40 enrolled participants, 58% were male, the median age was 20 (interquartile range 19–22.75), and 86% had access to the internet. Participants’ recommendations were considered in developing the HIV risk calculator. They indicated a preference for an easy-to-use, interactive, real-time assessment offering discrete and private means to self-assess HIV risk. In addition to providing feedback on the language and wording of the risk assessment tool, participants recommended creating a colorful, interactive and informational app. A collaborative and user-driven process is crucial for designing and developing HIV prevention tools for targeted groups. Participants emphasized that privacy, confidentiality, and ease of use contribute to the acceptability and willingness to use internet-enabled HIV prevention methods.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Many capture-recapture surveys of wildlife populations operate in continuous time but detections are typically aggregated into occasions for analysis, even when exact detection times are available. This discards information and introduces subjectivity, in the form of decisions about occasion definition. We develop a spatio-temporal Poisson process model for spatially explicit capture-recapture (SECR) surveys that operate continuously and record exact detection times. We show that, except in some special cases (including the case in which detection probability does not change within occasion), temporally aggregated data do not provide sufficient statistics for density and related parameters, and that when detection probability is constant over time our continuous-time (CT) model is equivalent to an existing model based on detection frequencies. We use the model to estimate jaguar density from a camera-trap survey and conduct a simulation study to investigate the properties of a CT estimator and discrete-occasion estimators with various levels of temporal aggregation. This includes investigation of the effect on the estimators of spatio-temporal correlation induced by animal movement. The CT estimator is found to be unbiased and more precise than discrete-occasion estimators based on binary capture data (rather than detection frequencies) when there is no spatio-temporal correlation. It is also found to be only slightly biased when there is correlation induced by animal movement, and to be more robust to inadequate detector spacing, while discrete-occasion estimators with binary data can be sensitive to occasion length, particularly in the presence of inadequate detector spacing. Our model includes as a special case a discrete-occasion estimator based on detection frequencies, and at the same time lays a foundation for the development of more sophisticated CT models and estimators. It allows modelling within-occasion changes in detectability, readily accommodates variation in detector effort, removes subjectivity associated with user-defined occasions, and fully utilises CT data. We identify a need for developing CT methods that incorporate spatio-temporal dependence in detections and see potential for CT models being combined with telemetry-based animal movement models to provide a richer inference framework.