Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Many capture-recapture surveys of wildlife populations operate in continuous time but detections are typically aggregated into occasions for analysis, even when exact detection times are available. This discards information and introduces subjectivity, in the form of decisions about occasion definition. We develop a spatio-temporal Poisson process model for spatially explicit capture-recapture (SECR) surveys that operate continuously and record exact detection times. We show that, except in some special cases (including the case in which detection probability does not change within occasion), temporally aggregated data do not provide sufficient statistics for density and related parameters, and that when detection probability is constant over time our continuous-time (CT) model is equivalent to an existing model based on detection frequencies. We use the model to estimate jaguar density from a camera-trap survey and conduct a simulation study to investigate the properties of a CT estimator and discrete-occasion estimators with various levels of temporal aggregation. This includes investigation of the effect on the estimators of spatio-temporal correlation induced by animal movement. The CT estimator is found to be unbiased and more precise than discrete-occasion estimators based on binary capture data (rather than detection frequencies) when there is no spatio-temporal correlation. It is also found to be only slightly biased when there is correlation induced by animal movement, and to be more robust to inadequate detector spacing, while discrete-occasion estimators with binary data can be sensitive to occasion length, particularly in the presence of inadequate detector spacing. Our model includes as a special case a discrete-occasion estimator based on detection frequencies, and at the same time lays a foundation for the development of more sophisticated CT models and estimators. It allows modelling within-occasion changes in detectability, readily accommodates variation in detector effort, removes subjectivity associated with user-defined occasions, and fully utilises CT data. We identify a need for developing CT methods that incorporate spatio-temporal dependence in detections and see potential for CT models being combined with telemetry-based animal movement models to provide a richer inference framework.
Facebook
TwitterMonthly Mean Dissolved-Solids Concentration and Monthly Mean Dissolved-Solids Load Data Flow weighted monthly mean dissolved-solids concentrations (mg/L) data and monthly mean dissolved-solids load data from 1928-2016 were computed by USGS using raw data from the Bureau of Reclamation. These data were computed by USGS for all of the seven sites (listed below). Colorado River above Imperial Dam, AZ-CA, 09429490 (1976-2016) Colorado River at Lees Ferry, AZ, 09380000 (1947-2016) Colorado River at Northern International Boundary, above Morelos Dam, AZ, 09522000 (1961-2016) Colorado River below Hoover Dam, AZ-NV, 09421500 (1948-2016) Colorado River near Cisco, UT, 09180500 (1928-2016) Green River at Green River, UT, 09315000 (1928-2016) San Juan River near Bluff, UT, 09379500 (1929-2016) Monthly mean dissolved-solids concentrations and loads were not calculated for several time periods (listed below) because of insufficient discrete dissolved-solids concentration data: Colorado River below Hoover Dam, AZ-NV, 09421500 (October 1962 - September 1953) Colorado River near Cisco, UT, 09180500 (October 1936 - September 1938 and October 1939 - September 1940) Green River at Green River, UT, 09315000 (October 1936 - September 1938, October 1939 - September 1940, and October 1942 - September 1943) Discrete Dissolved-Solids Concentration Data Discrete dissolved-solids concentrations (mg/L) data and specific conductance (microsiemens/cm) from 1990-2016 were computed using raw data from the Bureau of Reclamation. These data were computed for four sites (listed below). Colorado River above Imperial Dam, AZ-CA, 09429490 (1990-2016), dissolved-solids Colorado River above Imperial Dam, AZ-CA, 09429490 (1993-2016), specific conductance Colorado River below Hoover Dam, AZ-NV, 09421500 (1993-2016), dissolved-solids Colorado River at Northern International Boundary, above Morelos Dam, AZ, 09522000 (2001-2016), dissolved-solids
Facebook
TwitterBlood Glucose discrete data set that already interpolated by Spline Method to measure value of MAGE. This data set aim at to find the alternative than using CGM (Continuous Glucose Monitoring) to predict diabetic using discrete data. The discrete data obtained from 27 fluctuations of blood glucose within 3 days that taken by Glucometer. After the data go through Interpolation method, there are 150+ point that can re-present as similar as CGM model.
There are 42 Patients Column A as CLASS means divide the conditions into 3 groups (1 for Pre-Diabet patient, 2 for Diabet patient, 3 for Normal patient)
Thank you for 42 volunteers that who are willing to spend time and energy for this study Related article - http://beei.org/index.php/EEI/article/view/2387
Hope with this data can create another study relate with predict Diabetic to personal user, so we can monitor our life-style
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
A combination of discrete and daily-aligned groundwater levels for the Mississippi River Valley alluvial aquifer clipped to the Mississippi Alluvial Plain, as defined by Painter and Westerman (2018), with corresponding metadata are based on processing of U.S. Geological Survey National Water Information System (NWIS) (U.S. Geological Survey, 2020) data. The processing was made after retrieval using aggregation and filtering through the infoGW2visGWDB software (Asquith and Seanor, 2019). The nomenclature GWmaster mimics that of the output from infoGW2visGWDB. Two separate data retrievals for NWIS were made. First, the discrete data were retrieved, and second, continuous records from recorder sites with daily-mean or other daily statistics codes were retrieved. Each dataset was separately passed through the infoGW2visGWDB software to create a "GWmaster discrete" and "GWmaster continuous" and these tables were combined and then sorted on the site identifier and date to form the data ...
Facebook
TwitterThis data release contains six different datasets that were used in the report SIR 2018-5108. These datasets contain discharge data, discrete dissolved-solids data, quality-control discrete dissolved data, and computed mean dissolved solids data that were collected at various locations between the Hoover Dam and the Imperial Dam. Study Sites: Site 1: Colorado River below Hoover Dam Site 2: Bill Williams River near Parker Site 3: Colorado River below Parker Dam Site 4: CRIR Main Canal Site 5: Palo Verde Canal Site 6: Colorado River at Palo Verde Dam Site 7: CRIR Lower Main Drain Site 8: CRIR Upper Levee Drain Site 9: PVID Outfall Drain Site 10: Colorado River above Imperial Dam Discrete Dissolved-solids Dataset and Replicate Samples for Discrete Dissolved-solids Dataset: The Bureau of Reclamation collected discrete water-quality samples for the parameter of dissolved-solids (sum of constituents). Dissolved-solids, measured in milligrams per liter, are the sum of the following constituents: bicarbonate, calcium, carbonate, chloride, fluoride, magnesium, nitrate, potassium, silicon dioxide, sodium, and sulfate. These samples were collected on a monthly to bimonthly basis at various time periods between 1990 and 2016 at Sites 1-5 and Sites 7-10. No data were collected for Site 6: Colorado River at Palo Verde Dam. The Bureau of Reclamation and the USGS collected discrete quality-control replicate samples for the parameter of dissolved-solids, sum of constituents measured in milligrams per liter. The USGS collected discrete quality-control replicate samples in 2002 and 2003 and the Bureau of Reclamation collected discrete quality-control replicate samples in 2016 and 2017. Listed below are the sites where these samples were collected at and which agency collected the samples. Site 3: Colorado River below Parker Dam: USGS and Reclamation Site 4: CRIR Main Canal: Reclamation Site 5: Palo Verde Canal: Reclamation Site 7: CRIR Lower Main Drain: Reclamation Site 8: CRIR Upper Levee Drain: Reclamation Site 9: PVID Outfall Drain: Reclamation Site 10: Colorado River above Imperial Dam: USGS and Reclamation Monthly Mean Datasets and Mean Monthly Datasets: Monthly mean discharge data (cfs), flow weighted monthly mean dissolved-solids concentrations (mg/L) data and monthly mean dissolved-solids load data from 1990 to 2016 were computed using raw data from the USGS and the Bureau of Reclamation. This data were computed for all 10 sites. Flow weighted monthly mean dissolved-solids concentration and monthly mean dissolved-solids load were not computed for Site 2: Bill Williams River near Parker. The monthly mean datasets that were calculated for each month for the period between 1990 and 2016 were used to compute the mean monthly discharge and the mean monthly dissolved-solids load for each of the 12 months within a year. Each monthly mean was weighted by how many days were in the month and then averaged for each of the twelve months. This was computed for all 10 sites except mean monthly dissolved-solids load were not computed at Site 2: Bill Williams River near Parker. Site 8a: Colorado River between Parker and Palo Verde Valleys was computed by summing the data from sites 6, 7 and 8. Bill Williams Daily Mean Discharge, Instantaneous Dissolved-solids Concentration, and Daily Means Dissolved-solids Load Dataset: Daily mean discharge (cfs), instantaneous solids concentration (mg/L), and daily mean dissolved solids load were calculated using raw data collected by the USGS and the Bureau of Reclamation. This data were calculated for Site 2: Bill Williams River near Parker for the period of January 1990 to February 2016. Palo Verde Irrigation District Outfall Drain Mean Daily Discharge Dataset: The Bureau of Reclamation collected mean daily discharge data for the period of 01/01/2005 to 09/30/2016 at the Palo Verde Irrigation District (PVID) outfall drain using a stage-discharge relationship.
Facebook
TwitterThe world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data release contains discrete chlorophyll data, specifically corrected chlorophyll a, uncorrected chlorophyll a, and pheophytin pigments, from inland waters in the Illinois River Basin for 1981–2023. These data are discrete samples (collected in the field and analyzed in the laboratory) of plankton (suspended algae) and periphyton (benthic algae) from lakes, streams, rivers, canals, and other aquatic sites. These data support the investigation of harmful algal blooms (HABs) in the Illinois River Basin. The data are multi-source, meaning multiple monitoring organizations collected and analyzed these samples. Data were sourced from the Water Quality Portal (WQP; which contains water quality data from many organizations), Illinois Natural History Survey (INHS), the Fox River Study Group (FRSG; which also contains data from multiple organizations), and previously unpublished data from the US Geological Survey’s National Water Quality Laboratory. Final chlorophyll data are provid ...
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains all positive natural numbers from 1 to 50 million and includes data on whether or not the number is:
License Info
Most of my datasets, models, and research, including this one, are published with the CC BY-NC 4.0 license, which means you can use this however you want for non-commercial purposes as long as you provide attribution. If you require a commercial license, connect with me on my site.
Facebook
TwitterThe authors observed the X-ray-bright E3 galaxy NGC 1600 and nearby members of the NGC 1600 group with the Chandra X-Ray Observatory ACIS-S3 to study their X-ray properties. NGC 1600 is the brightest member of the NGC 1600 group; NGC 1601 (1.6 arcminutes away) and NGC 1603 (2.5 arcminutes away) are the two nearest galaxies, both of which are non-interacting members. The authors adopted the 2MASS Point Source Catalog position of J2000.0 RA = 04h 31m 39.87s, Dec = -05o 05' 10.5" as the location of the center of the NGC 1600 galaxy. Unresolved emission dominates the Chandra observation; however, some of the emission is resolved into 71 sources, most of which are low-mass X-ray binaries associated with NGC 1600. Twenty-one of the sources have LX > 2 x 1039 ergs/s (0.3-10.0 keV; assuming they are at the distance of NGC 1600 of 59.98 Mpc), marking them as ultraluminous X-ray point source (ULX) candidates. NGC 1600 may have the largest number of ULX candidates in an early-type galaxy to date; however, cosmic variance in the number of background active galactic nuclei cannot be ruled out. The spectrum and luminosity function (LF) of the resolved sources are more consistent with sources found in other early-type galaxies than with sources found in star-forming regions of galaxies. The source LF and the spectrum of the unresolved emission both indicate that there are a large number of unresolved point sources. The authors propose that these sources are associated with globular clusters (GCs) and that NGC 1600 has a large GC specific frequency. Observations of the GC population in NGC 1600 would be very useful for testing this prediction. NGC 1600 was observed in two intervals on 2002 September 18-19 (ObsID 4283) and 2002 September 20 (ObsID 4371) with live exposures of 26,783 and 26,752 s, respectively. The first observation showed clear evidence of a major background "flare" in the first 20% of the observation. The second observation had some small fluctuations greater than 20% from the mean rate. After these were filtered, observations 4283 and 4371 had flare-free exposure times of 21,562 and 23,616 s, respectively. This table lists all 71 discrete sources detected by wavdetect over the 0.3-6 keV energy range in the combination of the two observations. The first 3 sources (source numbers 1, 2 and 3) are clearly extended according to the authors. The authors expect 11 +/- 2 foreground/background sources to be present based on the source counts in Brandt et al. (2000, AJ, 119, 2349) and Mushotzky et al. (2000, Nature, 404, 459). The authors determined the observed X-ray hardness ratios for the sources, using the same techniques that they have used previously. They define three hardness ratios as H21 = (M-S)/(M+S), H31 = (H-S)/(H+S), and H32 = (H-M)/(H+M), where S,M, and H are the total counts in the soft (0.3-1 keV), medium (1-2 keV) and hard (2-6 keV) respectively. From their previous definitions, they have reduced the hard band from 2-10 to 2-6 keV: since the 6-10 keV range is dominated by background photons for most sources, this should increase the S/N of the hardness ratio techniques. The hardness ratios measure observed counts, which are affected by Galactic absorption and quantum efficiency (QE) degradation in the Chandra ACIS detectors. In order to compare with other galaxies, it is useful to correct the hardness ratios for these two soft X-ray absorption effects. Therefore, the authors have calculated the intrinsic hardness ratios, denoted by a superscript 0, using a correction factor in each band appropriate to the best-fit spectrum of the resolved sources, and these are what are quoted in this table. This table was created by the HEASARC in May 2018 based on CDS Catalog J/ApJ/617/262/ file table1.dat, the list of detected discrete X-ray sources in the Chandra observation of the NGC 1600 group. This is a service provided by NASA HEASARC .
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is about: (Table 4) Mean values of chemical data from the discrete ash layers of ODP Leg 120 holes. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.757631 for more information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Input kinematics and ankle forces are included as MATLAB file. The Extended Discrete Element Method has been used to compute the ankle joint contact pressure distribution. For details email ibenemerito1@sheffield.ac.uk
Facebook
TwitterThe world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. In this paper, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequences of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also discuss results on real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Contains data and code for the manuscript 'Mean landscape-scale incidence of species in discrete habitats is patch size dependent'. Raw data consist of 202 published datasets collated from primary and secondary (e.g., government technical reports) sources. These sources summarise metacommunity structure for different taxonomic groups (birds, invertebrates, non-avian vertebrates or plants) in different types of discrete metacommunities including 'true' islands (i.e., inland, continental or oceanic archipelagos), habitat islands (e.g., ponds, wetlands, sky islands) and fragments (e.g., forest/woodland or grass/shrubland habitat remnants). The aim of the study was to test whether the size of a habitat patch influences the mean incidences of species within it, relative to the incidence of all species across the landscape. In other words, whether high-incidence (widespread) or low-incidence (narrow-range) species are found more often than expected in smaller or larger patches. To achieve this, a new standardized effect size metric was developed that quantifies the mean observed incidence of all species present in every patch (the geometric mean of the number of patches in which all species were observed) and compares this with an expectation based on re-sampling the incidences of all species in all patches. Meta-regression of the 202 datasets was used to test the relationship between this metric, the 'mean species landscape-scale incidences per patch' (MSLIP), and the size of habitat patches, and for differences in response among metacommunity types and taxonomic groups. Methods Details regarding keyword and other search strategies used to collate the raw database from published sources were presented in Deane, D. C. & He, F. (2018) Loss of only the smallest patches will reduce species diversity in most discrete habitat networks. Glob Chang Biol, 24, 5802-5814 and in Deane, D.C. (2022) Species accumulation in small-large vs large-small order: more species but not all species? Oecologia, 200, 273-284. Minimum data requirements were presence absence records for all species in all patches and area of each habitat patch. The database consists of 202 published datasets. The first column in each dataset is the area of the patch in question (in hectares), other columns record presence and absence of each species in each patch. In the study, a metric was calculated for every patch that quantifies how the incidence of species in each patch compares with an expectation derived from the occupancy of all species in all patches (called mean species landscape-scale incidences per patch or MSLIP). This value was regressed on patch size and other covariates to determine whether the representation of widespread (or narrowly distributed) species changes with patch size. In summary, the work flow proceeded in three steps. 1. Pre-processing. This stage consisted of calculating a standardized effect size (SES) for the MSLIP metric for every patch and extracting important covariates (taxon, patch type, total number of patches, total number of species, patch-level deviations from fitted island species area relationships, data quality) to be used in model building. 2. Model building. MSLIP SES was then modelled against patch area and other covariates using a multilevel Bayesian (meta-)regression model using Stan and brms in the statistical programming langauge R (Version 4.3.0). 3. Model analysis. The final model was analysed by running different scenarios and the patterns interpreted in light of the hypotheses under test and creating figures to illustrate these.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rows: 50,570 **Columns:** 13
This dataset contains raw time-series telemetry from a small hydroponics setup. It includes water chemistry and environment sensors (pH, TDS, temperature, humidity, water level) alongside actuator states (pH reducer pump, water add pump, nutrient adder, humidifier, exhaust fan). The goal is to support research on forecasting, control, anomaly detection, and resource-efficient sensing in indoor agriculture.
Data were collected from an Arduino/ESP-class IoT node connected to pH, TDS, water-level, and DHT sensors, with relay-controlled actuators for maintaining optimal growth conditions. Timestamps are device-recorded at variable intervals during daily operation.
IoTData --Raw--.csv — one row per timestamped reading/action.id (int) — Row identifier. timestamp (ISO 8601 string) — Local device timestamp of the record. pH (float) — Acidity/alkalinity of the nutrient solution (typical 5.5–6.8 for leafy greens). TDS (float, ppm) — Total Dissolved Solids of the solution (proxy for nutrient concentration). water_level (int, 0–3) — Discrete level indicator (0=empty/low … 3=high). DHT_temp (°C) — Ambient/air temperature measured near the reservoir. DHT_humidity (% RH) — Ambient relative humidity. water_temp (°C) — Water temperature of the reservoir. pH_reducer (ON/OFF) — Acid dosing pump state. add_water (ON/OFF) — Top-up pump state. nutrients_adder (ON/OFF) — Nutrient dosing pump state. humidifier (ON/OFF) — Ultrasonic/cool-mist actuator state. ex_fan (ON/OFF) — Exhaust fan state.Heads-up: the raw export contains some anomalies; see “Known Issues.”
pH: mean ≈ 6.00, min=0.27, max=11.57 TDS (ppm): mean ≈ 1154.1, min=−283.91, max=2278.35 DHT_temp (°C): mean ≈ 24.32, range 12.3–70.0 DHT_humidity (%): mean ≈ 71.71, range 25.0–3312.6 Actuator sparsity (ON counts):
- pH_reducer: 652
- add_water: 2,260
- nutrients_adder: 3,057
- humidifier: 2,458
- ex_fan: 52
TDS, extremely high DHT_humidity). Please clip/outlier-filter as appropriate. OFF, which matters for classification or event prediction tasks. TDS in ppm; temperatures in °C; humidity in %RH; water level is discrete (0–3).
Facebook
TwitterMonthly rollup of the discrete and daily-aligned groundwater levels were created from Robinson, Asquith, and Seanor (2020) data products with removal of the paired groundwater and surface-water sites listed by Robinson, Killian, and Asquith (2020). The monthly rollup is composed of (1) computed monthly "mean" values regardless of whether a well had one measurement in the month or up to about 30 days of daily-mean values, (2) standard deviation of the water levels within the month (sample size is generally just one day but for recorder sites could be up to about 30 days), (3) the last water level in the month, and (4) monthly counts of water levels. The algorithm is available within the sources of visGWDBmrva (Asquith and others, 2019). A comment is made that the string 1980-01-01_2019-12-31 is retained in the file naming to parallel that for Robinson, Asquith, and Seanor (2020) files although the day of the month has no meaning for a monthly rollup. There are 18,736 unique wells of statistics; 18,736 wells in the metadata; and 107,568 year-month entries in the monthly rollup product. References: Asquith, W.H., Seanor, R.C., McGuire, V.L. (contributor), and Kress, W.H. (contributor), 2019, Source code in R to quality assure, plot, summarize, interpolate, and extend groundwater-level information, visGWDB—Groundwater-level informatics with demonstration for the Mississippi River Valley alluvial aquifer: U.S. Geological Survey software release, Reston, Va., https://doi.org/10.5066/P9W004O6.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[Version 1.2] This version of the dataset fixes a bug found in the previous versions (see below for more information).
Dataset has been generated from the Monte Carlo simulations of lightning flashovers on medium voltage (MV) distribution lines. It is suitable for training machine learning models for classifying lightning flashovers on distribution lines, as well as for line insulation coordination studies. The dataset is hierarchical in nature (see below for more information) and class imbalanced.
Following five different types of lightning interaction with the MV distribution line have been simulated: (1) direct strike to phase conductor (when there is no shield wire present on the line), (2) direct strike to phase conductor with shield wire(s) present on the line (i.e. shielding failure), (3) direct strike to shield wire with backflashover event, (4) indirect near-by lightning strike to ground where shield wire is not present, and (5) indirect near-by lightning strike to ground where shield wire is present on the line. Last two types of lightning interactions induce overvoltage on the phase conductors by radiating EM fields from the strike channel that are coupled to the line conductors. Shield wire(s) provide shielding effects to direct, as well as screening effects to indirect, lightning strikes.
Dataset consists of the following variables:
'dist': perpendicular distance of the lightning strike location from the distribution line axis (m), generated from the Uniform distribution [0, 500] m,
'ampl': lightning current amplitude of the strike (kA), generated from the Log-Normal distribution (see IEC 60071 for additional information),
'veloc': velocity of the lightning return stroke current (m/us), generated from the Uniform distribution [50, 500] m/us,
'shield': binary indicator that signals presence or absence of the shield wire(s) on the line (0/1), generated from the Bernoulli distribution with a 50% probability,
'Ri': average value of the impulse impedance of the tower's grounding (Ohm), generated from the Normal distribution (clipped at zero on the left side) with median value of 50 Ohm and standard deviation of 12.5 Ohm; it should be mentioned that the impulse impedance is often much larger than the associated grounding resistance value, which is why a rather high value of 50 Ohm have been used here,
'EGM': electrogeometric model used for analyzing striking distances of the distribution line's tower; following options are available: 'Wagner', 'Young', 'AW', 'BW', 'Love', and 'Anderson', where 'AW' stands for Armstrong & Whitehead, while 'BW' means Brown & Whitehead model; statistical distribution of EGM models follows a user-defined discrete categorical distribution with respective probabilities: p = [0.1, 0.2, 0.1, 0.1, 0.3, 0.2],
'CFO': critical flashover voltage level of the distribution line's insulation (kV); following three levels have been used: 150, 150, and 160 kV, respectively, for three different distribution lines of height 10, 12, and 14 m,
'height': height of the phase conductors of the distribution line (m); distribution line has flat configuration of phase conductors with following heights: 10, 12, and 14 m; twin shield wires, if present, are 1.5 m above the phase conductors and 3 m apart; data set consists of 10000 simulations for each line height,
'flash': binary indicator that signals if the flashover has been recorded (1) or not (0). This variable is the outcome (binary class).
Note: It should be mentioned that the critical flashover voltage (CFO) level of the line is taken at 150 kV for the first two lines (10 m and 12 m) and 160 kV for the third line (14 m), and that the diameters of the phase conductors and shield wires for all treated lines are, respectively, 10 mm and 5 mm. Also, average grounding resistance of the shield wire is assumed at 10 Ohm for all treated cases (it has no discernible influence on the flashover rate). Dataset is class imbalanced and consists in total of 30000 simulations, with 10000 simulations for each of the three different MV distribution line heights (geometry) and CFO levels.
Important: Version 1.2 of the dataset fixes an important bug found in the previous data sets, where the column 'Ri' contained duplicate data from the column 'veloc'. This issue is now resolved.
Mathematical background used for the analysis of lightning interaction with the MV distribution line can be found in the references below.
References:
J. A. Martinez and F. Gonzalez-Molina, "Statistical evaluation of lightning overvoltages on overhead distribution lines using neural networks," in IEEE Transactions on Power Delivery, vol. 20, no. 3, pp. 2219-2226, July 2005, doi: 10.1109/TPWRD.2005.848734.
A. R. Hileman, "Insulation Coordination for Power Systems", CRC Press, Boca Raton, FL, 1999.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
Discover the limitless power of paragraphs with the DROP dataset! This adversary-crafted, 96k-question benchmark is a text-based exploration into complex discrete reasoning tasks. With its wide range of natural language understanding tasks, DROP allows you to take your NLP models and techniques to the next level by building powerful systems that can tackle more intricate challenges. Unprecedented in its complexity, DROP is an invaluable tool for redefining what's possible with natural language processing and creating a brighter future for our connected world. Unlock the potential within your paragraphs with DROP today!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the Unlocking Discrete Reasoning Challenges with the DROP Dataset The DROP dataset is an excellent resource for natural language understanding tasks, allowing users to explore the possibilities of discrete reasoning using text. This guide will provide an overview of how to get started and take full advantage of this powerful dataset.
Step 1: Explore the Dataset Structure
The DROP dataset contains two CSV files: train.csv and validation.csv. The train file contains 96k questions and answers related to natural language understanding tasks, while the validation file consists of questions and answers designed to evaluate a model's performance on the task at hand. Each row contains four columns: passage, passage_sourceidx, answers_texts, and answers_spans
The 'passage' column holds text that corresponds to a given question; 'passage_sourceidx' indicates which source document from which each passage was extracted; 'answers_texts' provides accurate strings indicating part or all of a given answer; and lastly, 'answers_spans' gives information about what part (or parts) of each passage are relevant for answering that question correctly in terms integer indices indicating starting points within each passage in order from first position (0) through last position (length -1).
Step 2: Pre-Processing Your Data With pandas/python libraries
After determining your context by exploring both CSV files it is time for pre-processing your data! To start learning more effectively you can use any existing python library such as pandas in order to cleanse noisy data by deleting empty cells , rows or values depending on what your problem requires before even beginning training your model's logic structure on it . You can also decide if it makes more sense splitting those passages into smaller chunks with fewer words so they are easier readable directly within code!
Step 3: Utilize Natural Language Processing Toolsets For Efficiency
Once you have taken necessary preprocessing steps like above you can benefit further enhancing efficiency by utilizing existing NLP toolsets such as Spacy which are amazing technologies fitting every kind of need when dealing with vast amounts real world data !
From their quick implementation capabilities for complex tasks such as extracting relevant entities providing trained tokenization models ready used helping identify proper Part Of Speech tags up until their customizable pipelines perfectly crafted according everybody’ purpose seen form common English creating better word embeddings underlying semantic meaning
- Developing natural language processing algorithms with the ability to detect complex patterns in text and context.
- Applying advanced logical operators to understand the relationship between individual concepts in a text passage.
- Creating models and systems capable of understanding multiple tasks simultaneously using a single dataset
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: validation.csv | Column name | Description | |:------------------|:----------------------------------------------------------------| | passage ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This synthetic dataset was generated from Monte Carlo simulations of lightning flashovers on medium voltage (MV) distribution lines. It is suitable for training machine learning models for classifying lightning flashovers on distribution lines. The dataset is hierarchical in nature (see below for more information) and class imbalanced.
Following five different types of lightning interaction with the MV distribution line have been simulated: (1) direct strike to phase conductor (when there is no shield wire present on the line), (2) direct strike to phase conductor with shield wire(s) present on the line (i.e. shielding failure), (3) direct strike to shield wire with backflashover event, (4) indirect near-by lightning strike to ground where shield wire is not present, and (5) indirect near-by lightning strike to ground where shield wire is present on the line. Last two types of lightning interactions induce overvoltage on the phase conductors by radiating EM fields from the strike channel that are coupled to the line conductors. Three different methods of indirect strike analysis have been implemented, as follows: Rusck's model, Chowdhuri-Gross model and Liew-Mar model. Shield wire(s) provide shielding effects to direct, as well as screening effects to indirect, lightning strikes.
Dataset consists of two independent distribution lines, with heights of 12 m and 15 m, each with a flat configuration of phase conductors. Twin shield wires, if present, are 1.5 m above the phase conductors and 3 m apart [2]. CFO level of the 12 m distribution line is 150 kV and that of the 15 m distribution line is 160 kV. Dataset consists of 10,000 simulations for each of the distribution lines.
Dataset contains following variables (features):
'dist': perpendicular distance of the lightning strike location from the distribution line axis (m), generated from the Uniform distribution [0, 500] m,
'ampl': lightning current amplitude of the strike (kA), generated from the Log-Normal distribution (see IEC 60071 for additional information),
'front': lightning current wave-front time (us), generated from the Log-Normal distribution; it needs to be emphasized that amplitudes (ampl) and wave-front times (front), as random variables, have been generated from the appropriate bivariate probability distribution which includes statistical correlation between these variates,
'veloc': velocity of the lightning return-stroke current defined indirectly through the parameter "w" that is generated from the Uniform distribution [50, 500] m/us, which is then used for computing the velocity from the following relation: v = c/sqrt(1+w/I), where "c" is the speed of light in free space (300 m/us) and "I" is the lightning-current amplitude,
'shield': binary indicator that signals presence or absence of the shield wire(s) on the line (0/1), generated from the Bernoulli distribution with a 50% probability,
'Ri': average value of the impulse impedance of the tower's grounding (Ohm), generated from the Normal distribution (clipped at zero on the left side) with median value of 50 Ohm and standard deviation of 12.5 Ohm; it should be mentioned that the impulse impedance is often much larger than the associated grounding resistance value, which is why a rather high value of 50 Ohm have been used here,
'EGM': electrogeometric model used for analyzing striking distances of the distribution line's tower; following options are available: 'Wagner', 'Young', 'AW', 'BW', 'Love', and 'Anderson', where 'AW' stands for Armstrong & Whitehead, while 'BW' means Brown & Whitehead model; statistical distribution of EGM models follows a user-defined discrete categorical distribution with respective probabilities: p = [0.1, 0.2, 0.1, 0.1, 0.3, 0.2],
'ind': indirect stroke model used for analyzing near-by indirect lightning strikes; following options were implemented: 'rusk' for the Rusck's model, 'chow' for the Chowdhuri-Gross model (with Jakubowski modification) and 'liew' for the Liew-Mar model; statistical distribution of these three models follows a user-defined discrete categorical distribution with respective probabilities: p = [0.6, 0.2, 0.2],
'CFO': critical flashover voltage level of the distribution line's insulation (kV),
'height': height of the phase conductors of the distribution line (m),
'flash': binary indicator that signals if the flashover has been recorded (1) or not (0). This variable is the outcome/label (i.e. binary class).
Mathematical background used for the analysis of lightning interaction with the MV distribution line can be found in the references cited below.
References:
A. R. Hileman, "Insulation Coordination for Power Systems", CRC Press, Boca Raton, FL, 1999.
J. A. Martinez and F. Gonzalez-Molina, "Statistical evaluation of lightning overvoltages on overhead distribution lines using neural networks," in IEEE Transactions on Power Delivery, vol. 20, no. 3, pp. 2219-2226, July 2005.
A. Borghetti, C. A. Nucci and M. Paolone, An Improved Procedure for the Assessment of Overhead Line Indirect Lightning Performance and Its Comparison with the IEEE Std. 1410 Method, IEEE Transactions on Power Delivery, Vol. 22, No. 1, 2007, pp. 684-692.
Facebook
TwitterThe dataset documents the spatial and temporal variability of nutrients and related water quality parameters at high spatial resolution in the North Delta, Central Delta, and the Western Delta out to Suisun Bay in the Sacramento-San Joaquin River Delta of California, USA. The dataset includes nitrate, ammonium, phosphate, dissolved organic carbon, temperature, conductivity, dissolved oxygen, and chlorophyll as well as information about phytoplankton community composition. Data-collection cruises were conducted under three different environmental/flow conditions in May, July, and October of 2018. The data release consists of a xml document, 13 text/csv documents, and a zip file. Descriptions for each document and file are listed below: 1. METADATA: (2018_High_resolution_mapping_surveys_v2.0.xml) – Metadata for child item “Assessing spatial variability of nutrients and related water quality constituents in the California Sacramento-San Joaquin Delta at the landscape scale: 2018 High resolution mapping surveys.” 2. QA DATA: (Delta Water Quality Mapping 2018 Data Corrections and Offsets.csv) – Data corrections and offsets calculations applied to the datasets. 3. DATA DICTIONARY (Delta Water Quality Mapping 2018 Data Dictionary v2.0.csv) – Data dictionary for tables and attributes. 4. DISCRETE SAMPLING DATA: (Delta Water Quality Mapping 2018 Discrete Sampling v2.0.csv) – Discrete sampling, water chemistry data from the Sacramento-San Joaquin Delta study area. 5. PHYTOPLANKTON ENUMERATION DATA FROM DISCRETE SAMPLES: (Delta Water Quality Mapping 2018 Discrete Sampling Phytoenumeration v2.0.csv) – Discrete sampling, phytoenumeration data from the Sacramento-San Joaquin Delta study area. 6. May 2018 FIELD MAPPING DATA (High resolution, timestamped and 20 second median filtered): (Delta Water Quality Mapping May 2018 High Resolution.csv) – High resolution, in situ, unfiltered, in vivo fluorescence (IVF), water chemistry data from the Sacramento-San Joaquin Delta study area. All measurements were collected in the field. 7. May 2018 MAPPING DATA ON COMMON SPATIAL FRAMEWORK: (Delta Water Quality Mapping May 2018 Spatially Aligned.csv) – Spatially Aligned (Centerline Extracted) and interpolated, water chemistry data from the Sacramento-San Joaquin Delta study area. 8. July 2018 FIELD MAPPING DATA (High resolution, timestamped and 20 second median filtered): (Delta Water Quality Mapping July 2018 High Resolution.csv) – High resolution, in situ, unfiltered, in vivo fluorescence (IVF), water chemistry data from the Sacramento-San Joaquin Delta study area. All measurements were collected in the field. 9. July 2018 MAPPING DATA ON COMMON SPATIAL FRAMEWORK: (Delta Water Quality Mapping July 2018 Spatially Aligned.csv) – Spatially Aligned (Centerline Extracted) and interpolated, water chemistry data from the Sacramento-San Joaquin Delta study area. 10. FIELD MAPPING DATA (High resolution, timestamped and 20 second median filtered): (Delta Water Quality Mapping October 2018 High Resolution.csv) – High resolution, in situ, unfiltered, in vivo fluorescence (IVF), water chemistry data from the Sacramento-San Joaquin Delta study area. All measurements were collected in the field. 11. October 2018 MAPPING DATA ON COMMON SPATIAL FRAMEWORK: (Delta Water Quality Mapping October 2018 Spatially Aligned.csv) – Spatially Aligned (Centerline Extracted) and interpolated, water chemistry data from the Sacramento-San Joaquin Delta study area. 12. CALCULATED DIFFERENCES BETWEEN MAY AND JULY, 2018: (Delta Water Quality Mapping May to July 2018 Differences.csv) – Calculated differences between months for water chemistry data from the Sacramento-San Joaquin Delta study area. 13. CALCULATED DIFFERENCES BETWEEN MAY AND OCTOBER 2018: (Delta Water Quality Mapping May to October 2018 Differences.csv) – Calculated differences between months for water chemistry data from the Sacramento-San Joaquin Delta study area. 14. CALCULATED DIFFERENCES BETWEEN JULY AND OCTOBER, 2018: (Delta Water Quality Mapping July to October 2018 Differences.csv) – Calculated differences between months for water chemistry data from the Sacramento-San Joaquin Delta study area. 15. SHAPE FILES FOR ALL RESULTS: (Maps_Shapefiles.zip) – Zip file containing 117 shapefiles for spatially aligned water quality constituents from the Sacramento-San Joaquin Delta study area. 16. METHODS: (Methods_ChildItem_v2.0.pdf) – Text document describing methods used for data and sample collection and analysis.
Facebook
TwitterCustomer Segmentation is the subdivision of a market into discrete customer groups that share similar characteristics. Customer Segmentation can be a powerful means to identify unsatisfied customer needs. Using the above data companies can then outperform the competition by developing uniquely appealing products and services. You are owing a supermarket mall and through membership cards, you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. You want to understand the customers like who are the target customers so that the sense can be given to marketing team and plan the strategy accordingly.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Many capture-recapture surveys of wildlife populations operate in continuous time but detections are typically aggregated into occasions for analysis, even when exact detection times are available. This discards information and introduces subjectivity, in the form of decisions about occasion definition. We develop a spatio-temporal Poisson process model for spatially explicit capture-recapture (SECR) surveys that operate continuously and record exact detection times. We show that, except in some special cases (including the case in which detection probability does not change within occasion), temporally aggregated data do not provide sufficient statistics for density and related parameters, and that when detection probability is constant over time our continuous-time (CT) model is equivalent to an existing model based on detection frequencies. We use the model to estimate jaguar density from a camera-trap survey and conduct a simulation study to investigate the properties of a CT estimator and discrete-occasion estimators with various levels of temporal aggregation. This includes investigation of the effect on the estimators of spatio-temporal correlation induced by animal movement. The CT estimator is found to be unbiased and more precise than discrete-occasion estimators based on binary capture data (rather than detection frequencies) when there is no spatio-temporal correlation. It is also found to be only slightly biased when there is correlation induced by animal movement, and to be more robust to inadequate detector spacing, while discrete-occasion estimators with binary data can be sensitive to occasion length, particularly in the presence of inadequate detector spacing. Our model includes as a special case a discrete-occasion estimator based on detection frequencies, and at the same time lays a foundation for the development of more sophisticated CT models and estimators. It allows modelling within-occasion changes in detectability, readily accommodates variation in detector effort, removes subjectivity associated with user-defined occasions, and fully utilises CT data. We identify a need for developing CT methods that incorporate spatio-temporal dependence in detections and see potential for CT models being combined with telemetry-based animal movement models to provide a richer inference framework.