13 datasets found
  1. f

    Data from: Error and anomaly detection for intra-participant time-series...

    • tandf.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    David R. Mullineaux; Gareth Irwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

  2. d

    Algorithms for Speeding up Distance-Based Outlier Detection

    • catalog.data.gov
    • data.nasa.gov
    • +1more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Algorithms for Speeding up Distance-Based Outlier Detection [Dataset]. https://catalog.data.gov/dataset/algorithms-for-speeding-up-distance-based-outlier-detection
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address this problem and develop sequential and distributed algorithms that are significantly more efficient than state-of-the-art methods while still guaranteeing the same outliers. By combining simple but effective indexing and disk block accessing techniques, we have developed a sequential algorithm iOrca that is up to an order-of-magnitude faster than the state-of-the-art. The indexing scheme is based on sorting the data points in order of increasing distance from a fixed reference point and then accessing those points based on this sorted order. To speed up the basic outlier detection technique, we develop two distributed algorithms (DOoR and iDOoR) for modern distributed multi-core clusters of machines, connected on a ring topology. The first algorithm passes data blocks from each machine around the ring, incrementally updating the nearest neighbors of the points passed. By maintaining a cutoff threshold, it is able to prune a large number of points in a distributed fashion. The second distributed algorithm extends this basic idea with the indexing scheme discussed earlier. In our experiments, both distributed algorithms exhibit significant improvements compared to the state-of-the-art distributed methods.

  3. R code

    • figshare.com
    txt
    Updated Jun 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Christine Dodge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

  4. COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2023). COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/4061
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2020
    Area covered
    Vietnam
    Description

    Geographic coverage

    National, regional

    Analysis unit

    Households

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The 2020 Vietnam COVID-19 High Frequency Phone Survey of Households (VHFPS) uses a nationally representative household survey from 2018 as the sampling frame. The 2018 baseline survey includes 46,980 households from 3132 communes (about 25% of total communes in Vietnam). In each commune, one EA is randomly selected and then 15 households are randomly selected in each EA for interview. We use the large module of to select the households for official interview of the VHFPS survey and the small module households as reserve for replacement. After data processing, the final sample size for Round 2 is 3,935 households.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The questionnaire for Round 2 consisted of the following sections

    Section 2. Behavior Section 3. Health Section 5. Employment (main respondent) Section 6. Coping Section 7. Safety Nets Section 8. FIES

    Cleaning operations

    Data cleaning began during the data collection process. Inputs for the cleaning process include available interviewers’ note following each question item, interviewers’ note at the end of the tablet form as well as supervisors’ note during monitoring. The data cleaning process was conducted in following steps: • Append households interviewed in ethnic minority languages with the main dataset interviewed in Vietnamese. • Remove unnecessary variables which were automatically calculated by SurveyCTO • Remove household duplicates in the dataset where the same form is submitted more than once. • Remove observations of households which were not supposed to be interviewed following the identified replacement procedure. • Format variables as their object type (string, integer, decimal, etc.) • Read through interviewers’ note and make adjustment accordingly. During interviews, whenever interviewers find it difficult to choose a correct code, they are recommended to choose the most appropriate one and write down respondents’ answer in detail so that the survey management team will justify and make a decision which code is best suitable for such answer. • Correct data based on supervisors’ note where enumerators entered wrong code. • Recode answer option “Other, please specify”. This option is usually followed by a blank line allowing enumerators to type or write texts to specify the answer. The data cleaning team checked thoroughly this type of answers to decide whether each answer needed recoding into one of the available categories or just keep the answer originally recorded. In some cases, that answer could be assigned a completely new code if it appeared many times in the survey dataset.
    • Examine data accuracy of outlier values, defined as values that lie outside both 5th and 95th percentiles, by listening to interview recordings. • Final check on matching main dataset with different sections, where information is asked on individual level, are kept in separate data files and in long form. • Label variables using the full question text. • Label variable values where necessary.

  5. Heidelberg Tributary Loading Program (HTLP) Dataset

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin, png
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCWQR; NCWQR (2024). Heidelberg Tributary Loading Program (HTLP) Dataset [Dataset]. http://doi.org/10.5281/zenodo.6606950
    Explore at:
    bin, pngAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    NCWQR; NCWQR
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is updated more frequently and can be visualized on NCWQR's data portal.

    If you have any questions, please contact Dr. Laura Johnson or Dr. Nathan Manning.

    The National Center for Water Quality Research (NCWQR) is a research laboratory at Heidelberg University in Tiffin, Ohio, USA. Our primary research program is the Heidelberg Tributary Loading Program (HTLP), where we currently monitor water quality at 22 river locations throughout Ohio and Michigan, effectively covering ~half of the land area of Ohio. The goal of the program is to accurately measure the total amounts (loads) of pollutants exported from watersheds by rivers and streams. Thus these data are used to assess different sources (nonpoint vs point), forms, and timing of pollutant export from watersheds. The HTLP officially began with high-frequency monitoring for sediment and nutrients from the Sandusky and Maumee rivers in 1974, and has continually expanded since then.

    Each station where samples are collected for water quality is paired with a US Geological Survey gage for quantifying discharge (http://waterdata.usgs.gov/usa/nwis/rt). Our stations cover a wide range of watershed areas upstream of the sampling point from 11.0 km2 for the unnamed tributary to Lost Creek to 19,215 km2 for the Muskingum River. These rivers also drain a variety of land uses, though a majority of the stations drain over 50% row-crop agriculture.

    At most sampling stations, submersible pumps located on the stream bottom continuously pump water into sampling wells inside heated buildings where automatic samplers collect discrete samples (4 unrefrigerated samples/d at 6-h intervals, 1974–1987; 3 refrigerated samples/d at 8-h intervals, 1988-current). At weekly intervals the samples are returned to the NCWQR laboratories for analysis. When samples either have high turbidity from suspended solids or are collected during high flow conditions, all samples for each day are analyzed. As stream flows and/or turbidity decreases, analysis frequency shifts to one sample per day. At the River Raisin and Muskingum River, a cooperator collects a grab sample from a bridge at or near the USGS station approximately daily and all samples are analyzed. Each sample bottle contains sufficient volume to support analyses of total phosphorus (TP), dissolved reactive phosphorus (DRP), suspended solids (SS), total Kjeldahl nitrogen (TKN), ammonium-N (NH4), nitrate-N and nitrite-N (NO2+3), chloride, fluoride, and sulfate. Nitrate and nitrite are commonly added together when presented; henceforth we refer to the sum as nitrate.

    Upon return to the laboratory, all water samples are analyzed within 72h for the nutrients listed below using standard EPA methods. For dissolved nutrients, samples are filtered through a 0.45 um membrane filter prior to analysis. We currently use a Seal AutoAnalyzer 3 for DRP, silica, NH4, TP, and TKN colorimetry, and a DIONEX Ion Chromatograph with AG18 and AS18 columns for anions. Prior to 2014, we used a Seal TRAACs for all colorimetry.

    2017 Ohio EPA Project Study Plan and Quality Assurance Plan

    Project Study Plan

    Quality Assurance Plan

    Data quality control and data screening

    The data provided in the River Data files have all been screened by NCWQR staff. The purpose of the screening is to remove outliers that staff deem likely to reflect sampling or analytical errors rather than outliers that reflect the real variability in stream chemistry. Often, in the screening process, the causes of the outlier values can be determined and appropriate corrective actions taken. These may involve correction of sample concentrations or deletion of those data points.

    This micro-site contains data for approximately 126,000 water samples collected beginning in 1974. We cannot guarantee that each data point is free from sampling bias/error, analytical errors, or transcription errors. However, since its beginnings, the NCWQR has operated a substantial internal quality control program and has participated in numerous external quality control reviews and sample exchange programs. These programs have consistently demonstrated that data produced by the NCWQR is of high quality.

    A note on detection limits and zero and negative concentrations

    It is routine practice in analytical chemistry to determine method detection limits and/or limits of quantitation, below which analytical results are considered less reliable or unreliable. This is something that we also do as part of our standard procedures. Many laboratories, especially those associated with agencies such as the U.S. EPA, do not report individual values that are less than the detection limit, even if the analytical equipment returns such values. This is in part because as individual measurements they may not be considered valid under litigation.

    The measured concentration consists of the true but unknown concentration plus random instrument error, which is usually small compared to the range of expected environmental values. In a sample for which the true concentration is very small, perhaps even essentially zero, it is possible to obtain an analytical result of 0 or even a small negative concentration. Results of this sort are often “censored” and replaced with the statement “

    Censoring these low values creates a number of problems for data analysis. How do you take an average? If you leave out these numbers, you get a biased result because you did not toss out any other (higher) values. Even if you replace negative concentrations with 0, a bias ensues, because you’ve chopped off some portion of the lower end of the distribution of random instrument error.

    For these reasons, we do not censor our data. Values of -9 and -1 are used as missing value codes, but all other negative and zero concentrations are actual, valid results. Negative concentrations make no physical sense, but they make analytical and statistical sense. Users should be aware of this, and if necessary make their own decisions about how to use these values. Particularly if log transformations are to be used, some decision on the part of the user will be required.

    Analyte Detection Limits

    https://ncwqr.files.wordpress.com/2021/12/mdl-june-2019-epa-methods.jpg?w=1024

    For more information, please visit https://ncwqr.org/

  6. Dataset for the paper "Observation of Acceleration and Deceleration Periods...

    • zenodo.org
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yide Qian; Yide Qian (2025). Dataset for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 " [Dataset]. http://doi.org/10.5281/zenodo.15022854
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yide Qian; Yide Qian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pine Island Glacier
    Description

    Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "

    • Description of the data and file structure

    The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".

    Files and variables

    File 1: Data_and_Code.zip

    Directory: Main_function

    **Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.

    MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.

    Script_1_Ice_velocity_process_flow.m

    Script_2_strain_rate_process_flow.m

    Script_3_DROT_grounding_line_extraction.m

    Script_4_Read_ICESat2_h5_files.m

    Script_5_Extraction_results.m

    MATLAB functions: Five Files that includes MATLAB functions that support the main script:

    1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.

    2_strain_rate: Include MATLAB functions related to strain rate calculation.

    3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.

    4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.

    5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).

    6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.

    Director: data_and_result

    Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.

    1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.

    2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).

    3_rockpoint: Extract velocities at non-moving region

    4_constant_detrend: removed orbit error

    5_Tidal_correction: remove atmospheric and tidal induced error

    6_rockpoint: Extract non-aggregated velocities at non-moving region

    6_vx_vy_v: trasform velocities from va/vr to vx/vy

    7_rockpoint: Extract aggregated velocities at non-moving region

    7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps

    8_strain_rate: calculated strain rate from aggregate ice velocity

    9_compare: store the results before and after tidal correction and aggregation.

    10_Block_result: times series results that extrac from 2D data.

    11_MALAB_output_png_result: Store .png files and time serties result

    12_DROT: Differential Range Offset Tracking results

    13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)

    14_MODIS_images: you can store MODIS images here

    shp: grounding line, rock region, ice front, and other shape files.

    File 2 : PIG_front_1947_2023.zip

    Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

    File 3 : PIG_DROT_GL_2016_2021.zip

    Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

    Data was derived from the following sources:
    Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.

  7. d

    Data from: Interference in the shared-stroop task: a comparison of self- and...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Pickering; Janet F. McLean; Chiara Gambi (2021). Interference in the shared-stroop task: a comparison of self- and other-monitoring [Dataset]. http://doi.org/10.5061/dryad.jwstqjq91
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2021
    Dataset provided by
    Dryad
    Authors
    Martin Pickering; Janet F. McLean; Chiara Gambi
    Time period covered
    2021
    Description

    The dataset includes two columns: RT reports the original onset latencies before any exclusion and before the trimming procedure. RTTrimmed reports the trimmed RT and it shows NA for values that were removed as outliers. See README for more details.

  8. t

    Data from: Equivalent black carbon concentration in 10 minutes time...

    • service.tib.eu
    Updated Nov 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Equivalent black carbon concentration in 10 minutes time resolution, measured in the Swiss container during MOSAiC 2019/2020 [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-952251
    Explore at:
    Dataset updated
    Nov 30, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains equivalent black carbon (eBC) concentrations, averaged to 10 min time resolution, measured during the year-long MOSAiC expedition from October 2019 to September 2020. The measurements were performed in the Swiss container on the D-deck of Research Vessel Polarstern, using a commercial aethalometer (model AE33, Magee Scientific, Berkeley, USA). The instrument was located behind an automated valve, which switched hourly between a total and an interstitial air inlet, with upper cutoff sizes of 40 and 1 µm respectively. The inlet flow, of 2 liters per minute, was verified biweekly. The dual spot technology of the instrument allowed for a real-time compensation of what is known as the loading effect (Drinovec et al., 2015). Optical absorption was measured at 7 different wavelengths simultaneously, with a 1 second time resolution. We used the absorption at 880 nm (channel 6) to derive eBC, using a mass absorption cross-section value of 7.77 m2g-1. The switching valve caused concentration spikes to be observed at the full hours, hence data points within ± 2 minutes of the full hours are removed. The dataset was averaged to 1 minute time resolution (original time resolution is 1 second) to reduce the largest part of the instrument's noise, and outliers of more than 3 times the standard deviation of an hourly moving window were removed from the 1 minute averaged dataset. During some times for which the switching valve mechanism was on, varying patterns of increased mean and standard deviation of the measurements were observed, due to a pressure drop in the inlet lines. We corrected it by taking the arithmetic means of the data points during interstitial inlet measurements and the two adjacent hours of total inlet measurements, subtracting these two values and adding this difference to the data points of the interstitial inlet measurements. Finally, the data were averaged to 10 minutes time resolution. Based on a visual inspection of the entire dataset, we removed periods of strong noise and intense negative spikes. These artifacts may have emerged from the averaging of the initially noisy 1 second time resolution dataset and/or from the dual spot compensation which may lead to the presence of strong negative outliers right after a large positive outlier. Data collected between June 3rd and June 9th were discarded as Polarstern was within Svalbard's 12 nautical miles zone. The aethalometer dataset was further cleaned for disturbing pollution emissions from local research activities (e.g., exhaust by Polarstern's engine and vents, skidoos, on-ice diesel generators) using a preexisting pollution mask developed by Beck et al. (2022a), where a multi-step pollution detection algorithm was applied on the interstitial CPC dataset at 1 minute time resolution (Beck et al., 2022b). This pollution mask was converted to 10 minutes time resolution by setting a condition where, if more than 1 data point is polluted in a 10 minutes moving window, the entire 10 minutes period is defined as polluted. The resulting flag “Flag_pollution” should be equal to 0 to retain un-polluted data points only.

  9. Vehicle insurance data

    • kaggle.com
    Updated Jun 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu Bhatt (2020). Vehicle insurance data [Dataset]. https://www.kaggle.com/junglisher/vehicle-insurance-data/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2020
    Dataset provided by
    Kaggle
    Authors
    Himanshu Bhatt
    Description

    ##Vehicle-insurance

    Vehicle Insurance data: This dataset contains multiple features according to the customer’s vehicle and insurance type.

    OBJECTIVE: Business requirement is to increase the clv (customer lifetime value) that means clv is the target variable.

    Data Cleansing:

    This dataset is pretty clean already, a few outliers are there. Remove the outliers.

    Why remove Outliers? Outliers are unusual values in dataset, and they can distort statistical analyses and violate their assumptions.

    Feature selection:

    This step is required to remove unwanted features.

    VIF and Correlation Coefficient can be used to find important features.

    VIF: Variance Inflation Factor It is a measure of collinearity among predictor variables within a multiple regression. It is calculated by taking the the ratio of the variance of all a given model's betas divide by the variance of a single beta if it were fit alone.

    Correlation Coefficient: A positive Pearson coefficient mean that one variable's value increases with the others. And a negative Pearson coefficient means one variable decreases as other variable decreases. Correlations coefficients of -1 or +1 mean the relationship is exactly linear.

    Log transformation and Normalisation: Many ML algorithms perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed.

    Applying different ML Algorithms to the dataset for predictions. Their accuracies are in notebook.

    Please see my work. And I am open to suggestion.

  10. S

    Data set of heat wave and drought indicators in the middle and lower reaches...

    • scidb.cn
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bai Qinmian; He shanfeng; Feng Aiqing; Li Zheng; Chen Chaobing; Yan Junhui (2024). Data set of heat wave and drought indicators in the middle and lower reaches of the Yangtze River from 1961 to 2020 [Dataset]. http://doi.org/10.57760/sciencedb.08151
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Bai Qinmian; He shanfeng; Feng Aiqing; Li Zheng; Chen Chaobing; Yan Junhui
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Yangtze River
    Description

    Based on daily temperature and precipitation observation data from 1961 to 2020 in the middle and lower reaches of the Yangtze River, through data screening, quality testing, and outlier removal, the heat wave index and the number of consecutive days without effective precipitation were used as discrimination criteria to calculate the number of days, frequency, and longest duration of heat waves and drought at each station. The data values of each indicator were sorted and statistically analyzed by time period and province to obtain the heat wave and drought indicator dataset in the middle and lower reaches of the Yangtze River. It consists of 6 data files and 1 related drawing. For the convenience of further processing and application, the data results are stored in. xls format files, all of which are named after the data category. The data file for heat wave and drought event indicators consists of 64 rows and 74 columns, with each row displaying the indicator values for each year, period before and after, and the entire study period. Each column represents the indicator values for each meteorological station, province, and the entire study area.

  11. f

    Cut-off points for LL under different concentration parameters and sample...

    • plos.figshare.com
    xls
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sümeyra Sert; Filiz Kardiyen (2023). Cut-off points for LL under different concentration parameters and sample sizes, q = 0.99. [Dataset]. http://doi.org/10.1371/journal.pone.0286448.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sümeyra Sert; Filiz Kardiyen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cut-off points for LL under different concentration parameters and sample sizes, q = 0.99.

  12. f

    Cut-off points for NW under different concentration parameters and sample...

    • plos.figshare.com
    xls
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sümeyra Sert; Filiz Kardiyen (2023). Cut-off points for NW under different concentration parameters and sample sizes, q = 0.95. [Dataset]. http://doi.org/10.1371/journal.pone.0286448.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sümeyra Sert; Filiz Kardiyen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cut-off points for NW under different concentration parameters and sample sizes, q = 0.95.

  13. EEG Data Pre Stimulation/Sham (All Three Tasks)

    • figshare.com
    bin
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tristan Bless; Nathaniel Goodman; Patrick Mulvany; Jessica Cramer; Fiza Singh; Jaime A. Pineda (2023). EEG Data Pre Stimulation/Sham (All Three Tasks) [Dataset]. http://doi.org/10.6084/m9.figshare.7834847.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tristan Bless; Nathaniel Goodman; Patrick Mulvany; Jessica Cramer; Fiza Singh; Jaime A. Pineda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EEG signal data (20 channels) was acquired using a Quick-30 EEG headset (Cognionics Inc.) with data recorded using proprietary acquisition software. Impedance was set to below 10 kOhm. All electrodes were amplified by a factor of 1,000x and sampled at 500 Hz. Online bandpass filtering was set at 0.3–100 Hz (half-amplitude, 3 dB/octave roll-off. A cloth EEG cap, with pre-set electrode locations was placed on the subject and measured to position the Cz site over the subjects actual Cz location. This was used to compute measurements for the placement of stimulation electrodes using the 10-20 EEG system following Meinzer et. al. (2014). EEG data were collected from twenty scalp electrodes (reference at earlobes) and processed using the EEGLAB toolbox (Delorme & Makeig, 2004) within the MATLAB processing platform (The Mathworks, Inc.). Data from the twenty relevant channels were selected and channel locations identified using Cognionics supplied location file. EEG channel data was first filtered using a linear finite impulse response (FIR) filter with a bandpass of 0.5 Hz to 40 Hz. For baseline and DAP task data, noisy outlier channel values were rejected using the manual data scrolling review function until data reached a 50 (+/- 10) microvolt vertical scale limit. For face recognition data, epochs (-1000 to 2000 ms) were extracted following the time-locking event. Mean baseline values ( -1000ms to 0ms) were subtracted from each epoch to remove DC drifts. Following visual inspection, all data sets were re-referenced to the average. Next, an Independent Component Analysis (ICA) was applied (RUNICA uses the infomax ICA algorithm of Bell & Sejnowski (1995), with the natural gradient feature of Amari, Cichocki & Yang (1999), and the extended-ICA algorithm of Lee, Girolami & Sejnowski (1999). It is used to separate the preprocessed EEG signals into independent components and found to be effective in removing EOG and EMG artifacts, noise, as well as separating EEG cerebral sources, including mu components (Romero, Mananas, & Barbanoj, 2008; Ng & Raveendran, 2009; Gómez-Herrero et al., 2006). The RUNICA algorithm was applied at least twice during the processing of each dataset. In the first step, RUNICA was used similarly to the procedures described by Ng and Raveendran (2009) to extract and remove components that contained a large portion of noise and artifacts, including ocular artifacts. Next, “cleaned” EEG data were reconstructed from the remaining non-artifactual components. RUNICA was applied a second time to the cleaned EEG data so that an equal number of IC components comprised all the datasets. Sequential application of RUNICA yielded data with improved signal to noise ratio, by unmixing and removing standard EEG artifacts from the brain activity data. This overall approach was found to yield the most robust results. Event related potentials and event related spectral perturbations. Event related potentials (ERPs) and event related spectral perturbations (ERSPs), deviations in amplitude and spectral power relative to a baseline, respectively, were calculated for faces in the STIM and SHAM conditions, using built-in EEGLAB procedures (Delorme & Makeig, 2004). A time-frequency decomposition was computed for each individual condition using wavelets with Morlet tapers, and the deviations in log spectral power in each time-frequency bin were then computed, relative to the mean of the log spectral power of the 1000 ms pre-stimulus baseline. To compare responses for specific experimental conditions, the common baseline was calculated across those test conditions, and the component ERSP values were adjusted for the common baseline for each test. To assess statistical differences, nonparametric resampling methods were used (Manly, 2007). A bootstrap resampling method was used to test whether ERP and ERSP deviations in the post-stimulus interval were significantly larger relative to the pre-stimulus period for each subject and each separate condition.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002

Data from: Error and anomaly detection for intra-participant time-series data

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
David R. Mullineaux; Gareth Irwin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

Search
Clear search
Close search
Google apps
Main menu