22 datasets found

f
Data from: Error and anomaly detection for intra-participant time-series...
tandf.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5189002
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
David R. Mullineaux; Gareth Irwin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.
f
Data from: Methodology to filter out outliers in high spatial density data...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14305658.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam
microdata.worldbank.org
catalog.ihsn.org
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2023). COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/4061
Explore at:
Dataset updated
Oct 26, 2023
Dataset authored and provided by
World Bankhttps://www.worldbank.org/
Time period covered
2020
Area covered
Vietnam
Description
Geographic coverage

National, regional

Analysis unit

Households

Kind of data

Sample survey data [ssd]

Sampling procedure

The 2020 Vietnam COVID-19 High Frequency Phone Survey of Households (VHFPS) uses a nationally representative household survey from 2018 as the sampling frame. The 2018 baseline survey includes 46,980 households from 3132 communes (about 25% of total communes in Vietnam). In each commune, one EA is randomly selected and then 15 households are randomly selected in each EA for interview. We use the large module of to select the households for official interview of the VHFPS survey and the small module households as reserve for replacement. After data processing, the final sample size for Round 2 is 3,935 households.

Mode of data collection

Computer Assisted Telephone Interview [cati]

Research instrument

The questionnaire for Round 2 consisted of the following sections

Section 2. Behavior Section 3. Health Section 5. Employment (main respondent) Section 6. Coping Section 7. Safety Nets Section 8. FIES

Cleaning operations

Data cleaning began during the data collection process. Inputs for the cleaning process include available interviewers’ note following each question item, interviewers’ note at the end of the tablet form as well as supervisors’ note during monitoring. The data cleaning process was conducted in following steps: • Append households interviewed in ethnic minority languages with the main dataset interviewed in Vietnamese. • Remove unnecessary variables which were automatically calculated by SurveyCTO • Remove household duplicates in the dataset where the same form is submitted more than once. • Remove observations of households which were not supposed to be interviewed following the identified replacement procedure. • Format variables as their object type (string, integer, decimal, etc.) • Read through interviewers’ note and make adjustment accordingly. During interviews, whenever interviewers find it difficult to choose a correct code, they are recommended to choose the most appropriate one and write down respondents’ answer in detail so that the survey management team will justify and make a decision which code is best suitable for such answer. • Correct data based on supervisors’ note where enumerators entered wrong code. • Recode answer option “Other, please specify”. This option is usually followed by a blank line allowing enumerators to type or write texts to specify the answer. The data cleaning team checked thoroughly this type of answers to decide whether each answer needed recoding into one of the available categories or just keep the answer originally recorded. In some cases, that answer could be assigned a completely new code if it appeared many times in the survey dataset.
• Examine data accuracy of outlier values, defined as values that lie outside both 5th and 95th percentiles, by listening to interview recordings. • Final check on matching main dataset with different sections, where information is asked on individual level, are kept in separate data files and in long form. • Label variables using the full question text. • Label variable values where necessary.
n
Malaria disease and grading system dataset from public hospitals reflecting...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie (2023). Malaria disease and grading system dataset from public hospitals reflecting complicated and uncomplicated conditions [Dataset]. http://doi.org/10.5061/dryad.4xgxd25gn
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4xgxd25gn
Dataset updated
Nov 10, 2023
Dataset provided by
Nasarawa State University
Authors
Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Malaria is the leading cause of death in the African region. Data mining can help extract valuable knowledge from available data in the healthcare sector. This makes it possible to train models to predict patient health faster than in clinical trials. Implementations of various machine learning algorithms such as K-Nearest Neighbors, Bayes Theorem, Logistic Regression, Support Vector Machines, and Multinomial Naïve Bayes (MNB), etc., has been applied to malaria datasets in public hospitals, but there are still limitations in modeling using the Naive Bayes multinomial algorithm. This study applies the MNB model to explore the relationship between 15 relevant attributes of public hospitals data. The goal is to examine how the dependency between attributes affects the performance of the classifier. MNB creates transparent and reliable graphical representation between attributes with the ability to predict new situations. The model (MNB) has 97% accuracy. It is concluded that this model outperforms the GNB classifier which has 100% accuracy and the RF which also has 100% accuracy. Methods Prior to collection of data, the researcher was be guided by all ethical training certification on data collection, right to confidentiality and privacy reserved called Institutional Review Board (IRB). Data was be collected from the manual archive of the Hospitals purposively selected using stratified sampling technique, transform the data to electronic form and store in MYSQL database called malaria. Each patient file was extracted and review for signs and symptoms of malaria then check for laboratory confirmation result from diagnosis. The data was be divided into two tables: the first table was called data1 which contain data for use in phase 1 of the classification, while the second table data2 which contains data for use in phase 2 of the classification. Data Source Collection Malaria incidence data set is obtained from Public hospitals from 2017 to 2021. These are the data used for modeling and analysis. Also, putting in mind the geographical location and socio-economic factors inclusive which are available for patients inhabiting those areas. Naive Bayes (Multinomial) is the model used to analyze the collected data for malaria disease prediction and grading accordingly. Data Preprocessing: Data preprocessing shall be done to remove noise and outlier. Transformation: The data shall be transformed from analog to electronic record. Data Partitioning The data which shall be collected will be divided into two portions; one portion of the data shall be extracted as a training set, while the other portion will be used for testing. The training portion shall be taken from a table stored in a database and will be called data which is training set1, while the training portion taking from another table store in a database is shall be called data which is training set2. The dataset was split into two parts: a sample containing 70% of the training data and 30% for the purpose of this research. Then, using MNB classification algorithms implemented in Python, the models were trained on the training sample. On the 30% remaining data, the resulting models were tested, and the results were compared with the other Machine Learning models using the standard metrics. Classification and prediction: Base on the nature of variable in the dataset, this study will use Naïve Bayes (Multinomial) classification techniques; Classification phase 1 and Classification phase 2. The operation of the framework is illustrated as follows: i. Data collection and preprocessing shall be done. ii. Preprocess data shall be stored in a training set 1 and training set 2. These datasets shall be used during classification. iii. Test data set is shall be stored in database test data set. iv. Part of the test data set must be compared for classification using classifier 1 and the remaining part must be classified with classifier 2 as follows: Classifier phase 1: It classify into positive or negative classes. If the patient is having malaria, then the patient is classified as positive (P), while a patient is classified as negative (N) if the patient does not have malaria.
Classifier phase 2: It classify only data set that has been classified as positive by classifier 1, and then further classify them into complicated and uncomplicated class label. The classifier will also capture data on environmental factors, genetics, gender and age, cultural and socio-economic variables. The system will be designed such that the core parameters as a determining factor should supply their value.
Dataset for: FAG in buckwheat flowers and its possible biological relevance
zenodo.org
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marta Hornyák; Marta Hornyák (2024). Dataset for: FAG in buckwheat flowers and its possible biological relevance [Dataset]. http://doi.org/10.5281/zenodo.14000367
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14000367
Dataset updated
Oct 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marta Hornyák; Marta Hornyák
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The file " PFAG and FAG raw data" contains raw data as well as data that has been processed to remove outliers, used for analyzing PFAG and FAG content in different parts of buckwheat flowers and in response to applied LED light.

The file "Data for Correlations" contains data used for correlation analyses; it includes log10-transformed data without outliers, as well as raw flower count data.

The file "Correlation Analyses PFAG, FAG & Flowers" contains completed correlation analyses.
W
Hydrochemistry analysis of the Galilee subregion
cloud.csiss.gmu.edu
researchdata.edu.au
zip
Updated Dec 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australia (2019). Hydrochemistry analysis of the Galilee subregion [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/fd944f9f-14f6-4e20-bb8a-61d1116412ec
Explore at:
zip(26887349)Available download formats
Dataset updated
Dec 13, 2019
Dataset provided by
Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Galilee
Description
Abstract

This dataset was derived by the Bioregional Assessment Programme. The parent datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

This dataset contains analyses and summaries of hydrochemistry data for the Galilee subregion, and includes an additional quality assurance of the source hydrochemistry and waterlevel data to remove anomalous and outlier values.

Dataset History

Several bores were removed from the 'chem master sheet' in the QLD Hydrochemistry QA QC GAL v02 (GUID: e3fb6c9b-e224-4d2e-ad11-4bcba882b0af) dataset based on their TDS values. Bores with high or unrealistic TDS that were removed are found at the bottom of the 'updated data' sheet.

Outlier water level values from the JK GAL Bore Waterlevels v01 (GUID: 2f8fe7e6-021f-4070-9f63-aa996b77469d) dataset were identified and removed. Those bores are identified in the 'outliers not used' sheet

Pivot tables were created to summarise data, and create various histograms for analysis and interpretation. These are found in the 'chemistry histogram', 'Pivot tables', 'summaries'.

Dataset Citation

Bioregional Assessment Programme (2016) Hydrochemistry analysis of the Galilee subregion. Bioregional Assessment Derived Dataset. Viewed 07 December 2018, http://data.bioregionalassessments.gov.au/dataset/fd944f9f-14f6-4e20-bb8a-61d1116412ec.

Dataset Ancestors

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204

Derived From QLD DNRM Hydrochemistry with QA/QC

Derived From QLD Hydrochemistry QA QC GAL v02

Derived From QLD DNRM Galilee Mine Groundwater Bores - Water Levels

Derived From Galilee bore water levels v01

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores v3 03122014

Derived From RPS Galilee Hydrogeological Investigations - Appendix tables B to F (original)

Derived From Geoscience Australia, 1 second SRTM Digital Elevation Model (DEM)

Derived From Carmichael Coal Mine and Rail Project Environmental Impact Statement

Derived From QLD Department of Natural Resources and Mining Groundwater Database Extract 20131111
m
Data from: Classification of Heart Failure Using Machine Learning: A...
data.mendeley.com
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bryan Chulde (2024). Classification of Heart Failure Using Machine Learning: A Comparative Study [Dataset]. http://doi.org/10.17632/959dxmgj8d.1
Explore at:
Unique identifier
https://doi.org/10.17632/959dxmgj8d.1
Dataset updated
Oct 29, 2024
Authors
Bryan Chulde
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our research demonstrates that machine learning algorithms can effectively predict heart failure, highlighting high-accuracy models that improve detection and treatment. The Kaggle “Heart Failure” dataset, with 918 instances and 12 key features, was preprocessed to remove outliers and features a distribution of cases with and without heart disease (508 and 410). Five models were evaluated: the random forest achieved the highest accuracy (92%) and was consolidated as the most effective at classifying cases. Logistic regression and multilayer perceptron were also quite accurate (89%), while decision tree and k-nearest neighbors performed less well, showing that k-neighbors is less suitable for this data. F1 scores confirmed the random forest as the optimal one, benefiting from preprocessing and hyperparameter tuning. The data analysis revealed that age, blood pressure and cholesterol correlate with disease risk, suggesting that these models may help prioritize patients at risk and improve their preventive management. The research underscores the potential of these models in clinical practice to improve diagnostic accuracy and reduce costs, supporting informed medical decisions and improving health outcomes.
d
Tsetse fly wing landmark data for morphometrics (Vol 20, 21)
datadryad.org
search.dataone.org
+1more
zip
Updated Dec 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Geldenhuys (2022). Tsetse fly wing landmark data for morphometrics (Vol 20, 21) [Dataset]. http://doi.org/10.5061/dryad.qz612jmh1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.qz612jmh1
Dataset updated
Dec 2, 2022
Dataset provided by
Dryad
Authors
Dylan Geldenhuys
Time period covered
2022
Description
Single-wing images were captured from 14,354 pairs of field-collected tsetse wings of species Glossina pallidipes and G. m. morsitans and analysed together with relevant biological data. To answer research questions regarding these flies, we need to locate 11 anatomical landmark coordinates on each wing. The manual location of landmarks is time-consuming, prone to error, and simply infeasible given the number of images. Automatic landmark detection has been proposed to locate these landmark coordinates. We developed a two-tier method using deep learning architectures to classify images and make accurate landmark predictions. The first tier used a classification convolutional neural network to remove most wings that were missing landmarks. The second tier provided landmark coordinates for the remaining wings. For the second tier, compared direct coordinate regression using a convolutional neural network and segmentation using a fully convolutional network. For the resulting landmark pred...
Data from: Interplay of physical and social drivers of movement in male...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maggie Wisniewska; Caitlin E. O'Connell-Rodwell; J. Werner Kilian; Simon Garnier; Gareth J. Russell (2024). Interplay of physical and social drivers of movement in male African savanna elephants [Dataset]. http://doi.org/10.5061/dryad.4qrfj6qm3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4qrfj6qm3
Dataset updated
Nov 26, 2024
Dataset provided by
Etosha National Park
Harvard University
New Jersey Institute of Technology
Authors
Maggie Wisniewska; Caitlin E. O'Connell-Rodwell; J. Werner Kilian; Simon Garnier; Gareth J. Russell
Description
Despite extensive research into the behavioral ecology of free-ranging animal groups, questions remain about how group members integrate information about their physical and social surroundings. This is because a) tracking of multiple group members is limited to a few easily manageable species; and b) the tools to simultaneously quantify physical and social influences on an individual’s movement remain challenging, especially across large geographic scales. A relevant example of a widely-ranging species with a complex social structure and of conservation concern is the African savanna elephant. We evaluate highly synchronized GPS tracks from five male elephants in Etosha National Park in Namibia by incorporating their dynamic social landscape into an established resource selection model. The fitted model predicts movement patterns based simultaneously on the physical landscape (e.g., repeated visitation of waterholes) and the social landscape (e.g., avoidance of a dominant male). Combining the fitted models for multiple focal individuals produces landscape-dependent social networks that vary over space (e.g., with distance from a waterhole) and time (e.g., as the seasons change). The networks, especially around waterholes, are consistent with dominance patterns determined from previous behavioral studies. Models that combine physical landscape and social effects based on remote tracking can augment traditional methods for determining social structure from intensive behavioral observations. More broadly, these models will be essential to effective, in-situ conservation and management of wide-ranging social species in the face of anthropogenic disruptions to their physical surroundings and social connections. Methods Study subjects and the social landscape: The five individuals considered in this study belong to a large elephant subpopulation residing in the northeastern region of Etosha National Park, Namibia. As a part of a different research effort, these individuals were classified into several age, dominance, social, and reproductive categories (O’Connell-Rodwell et al. 2011; O’Connell et al. 2024a). The age structure in this population was determined on the basis of several morphological features and can be found in the original publication (O’Connell-Rodwell et al. 2011; O’Connell et al. 2022). The dominance categories are reported from a population-level, ordinal dominance hierarchy based on the frequency of agonistic dyadic interactions (i.e., displacement) observed during all-occurrence sampling, over multiple field seasons (O’Connell-Rodwell et al. 2024a). The social categories were approximated using social network analysis (i.e., eigenvector centrality—an index expressing how influential an individual is based on the frequency of associating with other influential conspecifics) (O’Connell-Rodwell et al. 2024a; O’Connell-Rodwell et al. 2024b;). The reproductive category expresses whether an elephant was in musth at the time of behavioral data collection. Tracking data: In September 2009, ENP personnel fitted five elephants with Global Positioning System (GSP) and satellite Global System for Mobile Communication (GSM) tracking devices. The trackers recorded positional data (i.e., longitude, latitude) every 15 minutes over approximately 24 months. Prior to analysis, we converted tracking data to Cartesian units (i.e., meters) using the Universal Transverse Mercator coordinate system (UTM) projection. We also filtered the data to remove outlier movements as follows: we kept only movements (pairs of GPS fixes) in which 1) the interval was 15 minutes, 2) the focal individual moved ≤ 300 m in that time, and 3) all four of the other tracked elephants were within 20 km. Criterion 1 eliminates missed fixes; criterion 2 eliminates a small number of unusually fast movements which could represent startle responses to rare stimuli; and criterion 3 ensures that there is at least the potential for social interactions between all five elephants. The resulting datasets (one for each focal individual) had between 27,397 and 30,584 movements. The physical landscape: To evaluate tracking data in the context of the physical landscape, we constructed a map of vegetation productivity using data from the 16-day 250 m Normalized Difference Vegetation Index (NDVI) MODIS imagery. We also created a map of the perennial waterholes by extracting coordinate information from existing geospatial records generated by ENP personnel. Finally, we compiled a map of ‘frequently visited areas’ (FVAs) as the centroids of the top 20 clusters of large turning angles (>90 degrees) in the movement data. These locations broadly correlate with the presence of shade and proximity to fruiting trees (Kilian, W. personal communication), which in other populations affect elephant movement. The Social Resource Selection Function (SRSF) model: Our approach extends the existing Resource Selection Function framework in which an individual’s location, when fixed (by a GPS device or other tracking methodology) is considered a choice made from a set of possible locations. This set of locations is bounded in space by how far from its previously known location the individual could reasonably be expected to move in the time between the two fixes. The relative probability of ending up at different destinations, relative to one’s current location, is modeled using conditional logistic regression (CLR) as a function of various environmental parameters that differ between locations (e.g., ‘vegetation density’, ‘distance to water’, distance to the previous location). The SRSF model adds to the RSF framework by considering the locations of other individuals in a moving group as time-varying point features of the landscape. One individual (the focal individual) is modeled, and the locations of the others (nonfocal individuals) are incorporated as ‘distance to neighbor’ values that can be calculated for all the possible locations in the CLR. Assuming that each elephant responds differently to different conspecifics, we calculate a set of social predictors by determining the distance to each neighbor separately. For any given movement m, the ‘choice’ is a binary response, where a potential location i is either the endpoint at which the individual was recorded (yi = 1) or one of the alternatives (yi = 0). For convenience, we have labeled the chosen location with the subscript j (j ∈ i). The probability of a movement is modeled by where X is a matrix of k predictors derived from the landscape data; β is a k by 1 matrix of parameters to be estimated; c is the total number of locations considered (1 being the actual endpoint and c – 1 being randomly sampled within a circle of fixed radius); and s is the probability of a stochastic, ‘non-choice-type movement’ for which the endpoint is independent to any of the included predictors. One example might be a sudden scare that causes a flight response. In this case, we assign all possible endpoints the same probability 1/c. Including the possibility of non-choice movements is a novel addition to the standard CLR model; we found that for these data it stabilized the parameter estimates (meaning that we obtained similar results with different random subsets of the data when it was included, and disparate estimates when it was not). Overall, pm is the predicted ‘preference value’ for the chosen location divided by (and therefore conditional on) the sum of the preference values for the random sample of possible locations. In practice, depending on the resolution of the landscape and the boundary of possible distances reached, the denominator could include hundreds or even thousands of random locations. This can make computation of the expression, which is repeated for every movement in a dataset, time-consuming—a challenge that then translates into the model fitting. It is thus standard practice to randomly select a fixed number of non-chosen alternative locations on the assumption that they will comprise a representative sample of the landscape variation available to the individual. Given that our landscape features — various distance measures plus an interpolated array of NDVI values — vary smoothly and continuously within our sampling radius, we used 30 random locations (so c = 31). We fit the CLR by maximizing L, the log-likelihood of the entire set of n movements, using quasi-Newton nonlinear maximization. We performed variable selection by first fitting models with all possible subsets of physical and social landscape variables in their quadratic forms, except for distance to the previous location, which was always included as a linear function as an established proxy for the effort required to move to a new location. We ranked the models using Akaike’s Information Criterion (AIC) and calculated importance scores for each variable as the cumulative Akaike weight of the models in which it appeared. Interpretation of the SRSF model outputs depends on the functional form of each variable over the range of its values and its importance score. Because a linear cost-of-movement function is in every model by design, we exclude it from further reporting and discussion. The functional forms of the remaining variables can be divided into five categories: 1) monotonically increasing or 2) decreasing (indicating a preference for large or smaller values of the variable in question); 3) convex with the maximum within the data range (a preference for intermediate values); 4) concave with the minimum within the data range (a preference for large and small values indicating a back-and-forth movement between the locations containing the variable in question); or 5) constant over the data range (lack of preference for a specific value) (Mashintonio et al. 2014). The SRSF model outputs are expressed as the relative preference for movement towards locations defined by the
e
MOOJa catalogs for Solar System Objects - Dataset - B2FIND
b2find.eudat.eu
Updated Aug 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). MOOJa catalogs for Solar System Objects - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/66a02175-6b39-5794-ac02-1282651d5431
Explore at:
Dataset updated
Aug 30, 2022
Description
The Javalambre Photometric Local Universe Survey (J-PLUS) is an observational campaign that aims to obtain photometry in 12 ultraviolet-visible filters (0.3-1um) over ~8500deg^2^ of the sky observable from Javalambre (Teruel, Spain). Due to its characteristics and observation strategy, this survey will allow a great number of Solar System small bodies to be analyzed, and with improved spectrophotometric resolution with respect to previous large-area photometric surveys in optical wavelengths. The main goal of the present work is to present the first catalog of magnitudes and colors of minor bodies of the Solar System compiled using the first data release (DR1) of the J-PLUS observational campaign: the Moving Objects Observed from Javalambre (MOOJa) catalog. Using the compiled photometric data we obtained very-low-resolution reflectance (photo)spectra of the asteroids. We first used a {sigma}-clipping algorithm in order to remove outliers and clean the data. We then devised a method to select the optimal solar colors in the J-PLUS photometric system. These solar colors were computed using two different approaches: on one hand, we used different spectra of the Sun convolved with the filter transmissions of the J-PLUS system, and on the other, we selected a group of solar-type stars in the J-PLUS DR1 according to their computed stellar parameters. Finally, we used the solar colors to obtain the reflectance spectra of the asteroids. We present photometric data in the J-PLUS filters for a total of 3122 minor bodies (3666 before outlier removal), and we discuss the main issues with the data, as well as some guidelines to solve them.
Dataset for the paper "Observation of Acceleration and Deceleration Periods...
zenodo.org
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yide Qian; Yide Qian (2025). Dataset for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 " [Dataset]. http://doi.org/10.5281/zenodo.15022854
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15022854
Dataset updated
Mar 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yide Qian; Yide Qian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Pine Island Glacier
Description
Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "

Description of the data and file structure

The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".

Files and variables

File 1: Data_and_Code.zip

Directory: Main_function

**Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.

MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.

Script_1_Ice_velocity_process_flow.m

Script_2_strain_rate_process_flow.m

Script_3_DROT_grounding_line_extraction.m

Script_4_Read_ICESat2_h5_files.m

Script_5_Extraction_results.m

MATLAB functions: Five Files that includes MATLAB functions that support the main script:

1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.

2_strain_rate: Include MATLAB functions related to strain rate calculation.

3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.

4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.

5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).

6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.

Director: data_and_result

Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.

1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.

2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).

3_rockpoint: Extract velocities at non-moving region

4_constant_detrend: removed orbit error

5_Tidal_correction: remove atmospheric and tidal induced error

6_rockpoint: Extract non-aggregated velocities at non-moving region

6_vx_vy_v: trasform velocities from va/vr to vx/vy

7_rockpoint: Extract aggregated velocities at non-moving region

7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps

8_strain_rate: calculated strain rate from aggregate ice velocity

9_compare: store the results before and after tidal correction and aggregation.

10_Block_result: times series results that extrac from 2D data.

11_MALAB_output_png_result: Store .png files and time serties result

12_DROT: Differential Range Offset Tracking results

13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)

14_MODIS_images: you can store MODIS images here

shp: grounding line, rock region, ice front, and other shape files.

File 2 : PIG_front_1947_2023.zip

Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

File 3 : PIG_DROT_GL_2016_2021.zip

Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

Data was derived from the following sources:
Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.
t
Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam...
service.tib.eu
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam bathymetry processed data (EM 120 echosounder dataset compilation) of RV METEOR & RV MARIA S. MERIAN during cruise M76/1 & MSM19/1c, Namibian continental slope. https://doi.org/10.1594/PANGAEA.932434 [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-932434
Explore at:
Dataset updated
Nov 29, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data contain bathymetric data from the Namibia continental slope. The data were acquired on R/V Meteor research expeditions M76/1 in 2008, and R/V Maria S. Merian expedition MSM19/1c in 2011. The purpose of the data was the exploration of the Namibian continental slope and espressially the investigation of large seafloor depressions. The bathymetric data were acquired with the 191-beam 12 kHz Kongsberg EM120 system. The data were processed using the public software package MBSystems. The loaded data were cleaned semi-automatically and manually, removing outliers and other erroneous data. Initial velocity fields were adjusted to remove artifacts from the data. Gridding was done in 10x10 m grid cells for the MSM19-1c dataset and 50x50 m for the M76 dataset using the Gaussian Weighted Mean algorithm.
GGWS-PCNN: A global gridded wind speed dataset (1973/01-2021/12; Ongoing...
zenodo.org
nc
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lihong Zhou; Haofeng Liu; Zhenzhong Zeng; Lihong Zhou; Haofeng Liu; Zhenzhong Zeng (2022). GGWS-PCNN: A global gridded wind speed dataset (1973/01-2021/12; Ongoing Update) [Dataset]. http://doi.org/10.1016/j.scib.2022.09.022
Explore at:
ncAvailable download formats
Unique identifier
https://doi.org/10.1016/j.scib.2022.09.022
Dataset updated
Oct 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lihong Zhou; Haofeng Liu; Zhenzhong Zeng; Lihong Zhou; Haofeng Liu; Zhenzhong Zeng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Profile of the dataset

The GGWS-PCNN is a global gridded monthly dataset of 10-m wind speed based on an artificial intelligence algorithm (the partial convolutional neural network), observations from weather stations (the HadISD dataset), and 34 climate models from CMIP6.

It has a resolution of 1.25° × 2.5° (latitude × longitude). We will update this dataset as soon as the new HadISD version is accessible.

For more details about the dataset and its reconstructed processes, please see our paper "An artificial intelligence reconstruction of global gridded surface winds" published in the Science Bulletin.

Notice

The HadISD discovered an issue in the wind data after 2013. So in their version 3.3.0.202201p and later, they fixed this issue. Find the website Met Office Hadley Centre observations datasets for more details.

Due to the limitations of existing AI algorithms in reconstructing data with many missing values, our product has a small number of outliers (e.g. wind speeds less than zero or very high), most of which are located in the Antarctic region. We recommend you remove these outliers before using this dataset.

Reference

Lihong Zhou, Haofeng Liu, Xin Jiang, et al. (2022). An artificial intelligence reconstruction of global gridded surface winds. Science Bulletin.
d
High frequency dataset for event-scale concentration-discharge analysis in a...
search.dataone.org
hydroshare.org
Updated Sep 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Musolff (2024). High frequency dataset for event-scale concentration-discharge analysis in a forested headwater 01/2018-08/2023 [Dataset]. http://doi.org/10.4211/hs.9be43573ba754ec1b3650ce233fc99de
Explore at:
Unique identifier
https://doi.org/10.4211/hs.9be43573ba754ec1b3650ce233fc99de
Dataset updated
Sep 21, 2024
Dataset provided by
Hydroshare
Authors
Andreas Musolff
Time period covered
Jan 1, 2018 - Aug 23, 2023
Area covered

Description
This composite repository contains high-frequency data of discharge, electrical conductivity, nitrate-N, DOC and water temperature obtained the Rappbode headwater catchment in the Harz mountains, Germany. This catchment was affected by a bark-beetle infestion and forest dieback from 2018 onwards.The data extents previous observations from the same catchment (RB) published as part of Musolff (2020). Details on the catchment can be found here: Werner et al. (2019, 2021), Musolff et al. (2021). The file RB_HF_data_2018_2023.txt states measurements for each timestep using the following columns: "index" (number of observation),"Date.Time" (timestamp in YYYY-MM-DD HH:MM:SS), "WT" (water temperature in degree celsius), "Q.smooth" ( discharge in mm/d smoothed using moving average), "NO3.smooth" (nitrate concentrations in mg N/L smoothed using moving average), "DOC.smooth" (Dissolved organic carbon concentrations in mg/L, smoothed using moving average), "EC.smooth" (electrical conductivity in µS/cm smoothed using moving average); NA - no data.

Water quality data and discharge was measured at a high-frequency interval of 15 min in the time period between January 2018 and August 2023. Both, NO3-N and DOC were measured using an in-situ UV-VIS probe (s::can spectrolyser, scan Austria). EC was measured using an in-situ probe (CTD Diver, Van Essen Canada). Discharge measurements relied on an established stage-discharge relationship based on water level observations (CTD Diver, Van Essen Canada, see Werner et al. [2019]). Data loggers were maintained every two weeks, including manual cleaning of the UV-VIS probes and grab sampling for subsequent lab analysis, calibration and validation.

Data preparation included five steps: drift corrections, outlier detection, gap filling, calibration and moving averaging: - Drift was corrected by distributing the offset between mean values one hour before and after cleaning equally among the two weeks maintenance interval as an exponential growth. - Outliers were detected with a two-step procedure. First, values outside a physically unlikely range were removed. Second, the Grubbs test, to detect and remove outliers, was applied to a moving window of 100 values. - Data gaps smaller than two hours were filled using cubic spline interpolation. - The resulting time series were globally calibrated against the lab measured concentration of NO3-N and DOC. EC was calibrated against field values obtained with a handheld WTW probe (WTW Multi 430, Xylem Analytics Germany). - Noise in the signal of both discharge and water quality was reduced by a moving average with a window lenght of 2.5 hours.

References: Musolff, A. (2020). High frequency dataset for event-scale concentration-discharge analysis. https://doi.org/http://www.hydroshare.org/resource/27c93a3f4ee2467691a1671442e047b8 Musolff, A., Zhan, Q., Dupas, R., Minaudo, C., Fleckenstein, J. H., Rode, M., Dehaspe, J., & Rinke, K. (2021). Spatial and Temporal Variability in Concentration-Discharge Relationships at the Event Scale. Water Resources Research, 57(10). Werner, B. J., A. Musolff, O. J. Lechtenfeld, G. H. de Rooij, M. R. Oosterwoud, and J. H. Fleckenstein (2019), High-frequency measurements explain quantity and quality of dissolved organic carbon mobilization in a headwater catchment, Biogeosciences, 16(22), 4497-4516. Werner, B. J., Lechtenfeld, O. J., Musolff, A., de Rooij, G. H., Yang, J., Grundling, R., Werban, U., & Fleckenstein, J. H. (2021). Small-scale topography explains patterns and dynamics of dissolved organic carbon exports from the riparian zone of a temperate, forested catchment. Hydrology and Earth System Sciences, 25(12), 6067-6086.
R code
figshare.com
txt
Updated Jun 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5021297.v1
Dataset updated
Jun 5, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Christine Dodge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers
Data from: Urbanev: An open benchmark dataset for urban electric vehicle...
data.niaid.nih.gov
search.dataone.org
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan (2025). Urbanev: An open benchmark dataset for urban electric vehicle charging demand prediction [Dataset]. http://doi.org/10.5061/dryad.np5hqc04z
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.np5hqc04z
Dataset updated
Apr 25, 2025
Dataset provided by
Sun Yat-sen University
Hong Kong Polytechnic University
Institute of High Performance Computing
Authors
Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The recent surge in electric vehicles (EVs), driven by a collective push to enhance global environmental sustainability, has underscored the significance of exploring EV charging prediction. To catalyze further research in this domain, we introduce UrbanEV—an open dataset showcasing EV charging space availability and electricity consumption in a pioneering city for vehicle electrification, namely Shenzhen, China. UrbanEV offers a rich repository of charging data (i.e., charging occupancy, duration, volume, and price) captured at hourly intervals across an extensive six-month span for over 20,000 individual charging stations. Beyond these core attributes, the dataset also encompasses diverse influencing factors like weather conditions and spatial proximity. These factors are thoroughly analyzed qualitatively and quantitatively to reveal their correlations and causal impacts on charging behaviors. Furthermore, comprehensive experiments have been conducted to showcase the predictive capabilities of various models, including statistical, deep learning, and transformer-based approaches, using the UrbanEV dataset. This dataset is poised to propel advancements in EV charging prediction and management, positioning itself as a benchmark resource within this burgeoning field. Methods To build a comprehensive and reliable benchmark dataset, we conduct a series of rigorous processes from data collection to dataset evaluation. The overall workflow sequentially includes data acquisition, data processing, statistical analysis, and prediction assessment. As follows, please see detailed descriptions. Study area and data acquisition

Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, EV charging data was automatically collected from a mobile platform used by EV drivers to locate public charging stations. Through this platform, users could access real-time information on each charging pile, including its availability (e.g., busy or idle), charging price, and geographic coordinates. Accordingly, we recorded the charging-related data at five-minute intervals from September 1, 2022, to February 28, 2023. This data collection process was fully digital and did not require manual readings. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city were acquired from two meteorological observatories situated in the airport and central regions, respectively. These meteorological data are publicly available on the Shenzhen Government Data Open Platform. Thirdly, point of interest (POI) data was extracted through the Application Programming Interface Platform of AMap.com, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions.

Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, a program was employed to extract the status (e.g., busy or idle, charging price, electricity volume, and coordinates) of each charging pile at five-minute intervals from 1 September 2022 to 28 February 2023. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city was acquired from two meteorological observatories situated in the airport and central regions, respectively. Thirdly, point of interest (POI) data was extracted, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions. Processing raw information into well-structured data To streamline the utilization of the UrbanEV dataset, we harmonize heterogeneous data from various sources into well-structured data with aligned temporal and spatial resolutions. This process can be segmented into two parts: the reorganization of EV charging data and the preparation of other influential factors. EV charging data The raw charging data, obtained from publicly available EV charging services, pertains to charging stations and predominantly comprises string-type records at a 5-minute interval. To transform this raw data into a structured time series tailored for prediction tasks, we implement the following three key measures:

Initial Extraction. From the string-type records, we extract vital information for each charging pile, such as availability (designated as "busy" or "idle"), rated power, and the corresponding charging and service fees applicable during the observed time periods. First, a charging pile is categorized as "active charging" if its states at two consecutive timestamps are both "busy". Consequently, the occupancy within a charging station can be defined as the count of in-use charging piles, while the charging duration is calculated as the product of the count of in-use piles and the time between the two timestamps (in our case, 5 minutes). Moreover, the charging volume in a station can correspondingly be estimated by multiplying the duration by the piles' rated power. Finally, the average electricity price and service price are calculated for each station in alignment with the same temporal resolution as the three charging variables.

Error Detection and Imputation. Ensuring data quality is paramount when utilizing charging data for decision-making, advanced analytics, and machine-learning applications. It is crucial to address concerns around data cleanliness, as the presence of inaccuracies and inconsistencies, often referred to as dirty data, can significantly compromise the reliability and validity of any subsequent analysis or modeling efforts. To improve data quality of our charging data, several errors are identified, particularly the negative values for charging fees and the inconsistencies between the counts of occupied, idle, and total charging piles. We remove the records containing these anomalies and treat them as missing data. Besides that, a two-step imputation process was implemented to address missing values. First, forward filling replaced missing values using data from preceding timestamps. Then, backward filling was applied to fill gaps at the start of each time series. Moreover, a certain number of outliers were identified in the dataset, which could significantly impact prediction performance. To address this, the interquartile range (IQR) method was used to detect outliers for metrics including charging volume (v), charging duration (d), and the rate of active charging piles at the charging station (o). To retain more original data and minimize the impact of outlier correction on the overall data distribution, we set the coefficient to 4 instead of the default 1.5. Finally, each outlier was replaced by the mean of its adjacent valid values. This preprocessing pipeline transformed the raw data into a structured and analyzable dataset.

Aggregation and Filtration. Building upon the station-level charging data that has been extracted and cleansed, we further organize the data into a region-level dataset with an hourly interval providing a new perspective for EV charging behavior analysis. This is achieved by two major processes: aggregation and filtration. First, we aggregate all the charging data from both temporal and spatial views: a. Temporally, we standardize all time-series data to a common time resolution of one hour, as it serves as the least common denominator among the various resolutions. This aims to establish a unified temporal resolution for all time-series data, including pricing schemes, weather records, and charging data, thereby creating a well-structured dataset. Aggregation rules specify that the five-minute charging volume v and duration $(d)$ are summed within each interval (i.e., one hour), whereas the occupancy o, electricity price pe, and service price ps are assigned specific values at certain hours for each charging pile. This distinction arises from the inherent nature of these data types: volume v and duration d are cumulative, while o, pe, and ps are instantaneous variables. Compared to using the mean or median values within each interval, selecting the instantaneous values of o, pe, and ps as representatives preserves the original data patterns more effectively and minimizes the influence of human interpretation. b. Spatially, stations are aggregated based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. After aggregation, our aggregated dataset comprises 331 regions (also called traffic zones) with 4344 timestamps. Second, variance tests and zero-value filtering functions were employed to filter out traffic zones with zero or no change in charging data. Specifically, it means that
MODIS/Terra Land Surface Temperature/3-Band Emissivity 8-Day L3 Global 1km...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MODIS/Terra Land Surface Temperature/3-Band Emissivity 8-Day L3 Global 1km SIN Grid V061 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/modis-terra-land-surface-temperature-3-band-emissivity-8-day-l3-global-1km-sin-grid-v061-8d074
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
A suite of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature and Emissivity (LST&E) products are available in Collection 6.1. The MOD21 Land Surface Temperatuer (LST) algorithm differs from the algorithm of the MOD11 LST products, in that the MOD21 algorithm is based on the ASTER Temperature/Emissivity Separation (TES) technique, whereas the MOD11 uses the split-window technique. The MOD21 TES algorithm uses a physics-based algorithm to dynamically retrieve both the LST and spectral emissivity simultaneously from the MODIS thermal infrared bands 29, 31, and 32. The TES algorithm is combined with an improved Water Vapor Scaling (WVS) atmospheric correction scheme to stabilize the retrieval during very warm and humid conditions. The MOD21A2 dataset is an 8-day composite LST product at 1,000 meter spatial resolution that uses an algorithm based on a simple averaging method. The algorithm calculates the average from all the cloud free MOD21A1D and MOD21A1N daily acquisitions from the 8-day period. Unlike the MOD21A1 data sets where the daytime and nighttime acquisitions are separate products, the MOD21A2 contains both daytime and nighttime acquisitions as separate Science Dataset (SDS) layers within a single Hierarchical Data Format (HDF) file. The LST, Quality Control (QC), view zenith angle, and viewing time have separate day and night SDS layers, while the values for the MODIS emissivity bands 29, 31, and 32 are the average of both the nighttime and daytime acquisitions. Additional details regarding the method used to create this Level 3 (L3) product are available in the Algorithm Theoretical Basis Document (ATBD).Known Issues Users of MODIS LST products may notice an increase in occurrences of extreme high temperature outliers in the unfiltered MxD21 Version 6 and 6.1 products compared to the heritage MxD11 LST products. This can occur especially over desert regions like the Sahara where undetected cloud and dust can negatively impact both the MxD21 and MxD11 retrieval algorithms. * In the MxD11 LST products, these contaminated pixels are flagged in the algorithm and set to fill values in the output products based on differences in the band 32 and band 31 radiances used in the generalized split window algorithm. In the MxD21 LST products, values for the contaminated pixels are retained in the output products (and may result in overestimated temperatures), and users need to apply Quality Control (QC) filtering and other error analyses for filtering out bad values. High temperature outlier thresholds are not employed in MxD21 since it would potentially remove naturally occurring hot surface targets such as fires and lava flows. High atmospheric aerosol optical depth (AOD) caused by vast dust outbreaks in the Sahara and other deserts highlighted in the example documentation are the primary reason for high outlier surface temperature values (and corresponding low emissivity values) in the MxD21 LST products. Future versions of the MxD21 product will include a dust flag from the MODIS aerosol product and/or brightness temperature look up tables to filter out contaminated dust pixels. It should be noted that in the MxD11B day/night algorithm products, more advanced cloud filtering is employed in the multi-day products based on a temporal analysis of historical LST over cloudy areas. This may result in more stringent filtering of dust contaminated pixels in these products. * In order to mitigate the impact of dust in the MxD21 V6 and 6.1 products, the science team recommends using a combination of the existing QC bits, emissivity values, and estimated product errors, to confidently remove bad pixels from analysis. For more details, refer to this dust and cloud contamination example documentation. For complete information about known issues please refer to the MODIS/VIIRS Land Quality Assessment website.Improvements/Changes from Previous Versions The Version 6.1 Level-1B (L1B) products have been improved by undergoing various calibration changes that include: changes to the response-versus-scan angle (RVS) approach that affects reflectance bands for Aqua and Terra MODIS, corrections to adjust for the optical crosstalk in Terra MODIS infrared (IR) bands, and corrections to the Terra MODIS forward look-up table (LUT) update for the period 2012 - 2017. A polarization correction has been applied to the L1B Reflective Solar Bands (RSB). The product utilizes GEOS data replacing MERRA2. * Three new CMG products are available in the MxD21 suite (MxD21C1/C2/C3).
Intermediate data for TE calculation
zenodo.org
bin, csv
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Liu; Yue Liu (2025). Intermediate data for TE calculation [Dataset]. http://doi.org/10.5281/zenodo.10373032
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10373032
Dataset updated
May 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yue Liu; Yue Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes intermediate data from RiboBase that generates translation efficiency (TE). The code to generate the files can be found at https://github.com/CenikLab/TE_model.

We uploaded demo HeLa .ribo files, but due to the large storage requirements of the full dataset, I recommend contacting Dr. Can Cenik directly to request access to the complete version of RiboBase if you need the original data.

The detailed explanation for each file:

human_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in human.

human_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in human.

human_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in human.

human_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in human.

human_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in human.

human_TE_rho.rda: TE proportional similarity data as genes by genes matrix in human.

mouse_flatten_ribo_clr.rda: ribosome profiling clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

mouse_flatten_rna_clr.rda: matched RNA-seq clr normalized data with GEO GSM ids in columns and genes in rows in mouse.

mouse_flatten_te_clr.rda: TE clr data with GEO GSM ids in columns and genes in rows in mouse.

mouse_TE_cellline_all_plain.csv: TE clr data with genes in rows and cell lines in rows in mouse.

mouse_RNA_rho_new.rda: matched RNA-seq proportional similarity data as genes by genes matrix in mouse.

mouse_TE_rho.rda: TE proportional similarity data as genes by genes matrix in mouse.

All the data was passed quality control. There are 1054 mouse samples and 835 mouse samples:
* coverage > 0.1 X
* CDS percentage > 70%
* R2 between RNA and RIBO >= 0.188 (remove outliers)

All ribosome profiling data here is non-dedup winsorizing data paired with RNA-seq dedup data without winsorizing (even though it names as flatten, it just the same format of the naming)

####code
If you need to read rda data please use load("rdaname.rda") with R

If you need to calculate proportional similarity from clr data:
library(propr)
human_TE_homo_rho <- propr:::lr2rho(as.matrix(clr_data))
rownames(human_TE_homo_rho) <- colnames(human_TE_homo_rho) <- rownames(clr_data)
MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 0.05Deg CMG V061 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/modis-terra-land-surface-temperature-3-band-emissivity-daily-l3-global-0-05deg-cmg-v061-a4c2b
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
A suite of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature and Emissivity (LST&E) products are available in Collection 6.1. The MOD21 Land Surface Temperature (LST) algorithm differs from the algorithm of the MOD11 LST products, in that the MOD21 algorithm is based on the ASTER Temperature/Emissivity Separation (TES) technique, whereas the MOD11 uses the split-window technique. The MOD21 TES algorithm uses a physics-based algorithm to dynamically retrieve both the LST and spectral emissivity simultaneously from the MODIS thermal infrared bands 29, 31, and 32. The TES algorithm is combined with an improved Water Vapor Scaling (WVS) atmospheric correction scheme to stabilize the retrieval during very warm and humid conditions. The MOD21C1 dataset is produced daily in a 0.05 degree (5,600 meters at the equator) Climate Modeling Grid (CMG) from daytime Level 2 Gridded (L2G) intermediate LST products. The L2G process maps the daily MOD21 swath granules onto a sinusoidal MODIS grid and stores all observations falling over a gridded cell for a given day. The MOD21C1 algorithm sorts through these observations for each cell and estimates the final LST value as an average from all observations that are cloud free and have good LST&E accuracies. The daytime average is weighted by the observation coverage for that cell. Only observations having an observation coverage greater than a 15% threshold are considered. The MOD21C1 product contains seven Science Datasets (SDS), which include the calculated LST as well as quality control, the three emissivity bands, view zenith angle, and time of observation. Additional details regarding the methodology used to create this Level 3 (L3) product are available in the Algorithm Theoretical Basis Document (ATBD).Known Issues Users of MODIS LST products may notice an increase in occurrences of extreme high temperature outliers in the unfiltered MxD21 Version 6 and 6.1 products compared to the heritage MxD11 LST products. This can occur especially over desert regions like the Sahara where undetected cloud and dust can negatively impact both the MxD21 and MxD11 retrieval algorithms. * In the MxD11 LST products, these contaminated pixels are flagged in the algorithm and set to fill values in the output products based on differences in the band 32 and band 31 radiances used in the generalized split window algorithm. In the MxD21 LST products, values for the contaminated pixels are retained in the output products (and may result in overestimated temperatures), and users need to apply Quality Control (QC) filtering and other error analyses for filtering out bad values. High temperature outlier thresholds are not employed in MxD21 since it would potentially remove naturally occurring hot surface targets such as fires and lava flows. High atmospheric aerosol optical depth (AOD) caused by vast dust outbreaks in the Sahara and other deserts highlighted in the example documentation are the primary reason for high outlier surface temperature values (and corresponding low emissivity values) in the MxD21 LST products. Future versions of the MxD21 product will include a dust flag from the MODIS aerosol product and/or brightness temperature look up tables to filter out contaminated dust pixels. It should be noted that in the MxD11B day/night algorithm products, more advanced cloud filtering is employed in the multi-day products based on a temporal analysis of historical LST over cloudy areas. This may result in more stringent filtering of dust contaminated pixels in these products. * In order to mitigate the impact of dust in the MxD21 V6 and 6.1 products, the science team recommends using a combination of the existing QC bits, emissivity values, and estimated product errors, to confidently remove bad pixels from analysis. For more details, refer to this dust and cloud contamination example documentation. For complete information about known issues please refer to the MODIS/VIIRS Land Quality Assessment website.Improvements/Changes from Previous Versions The Version 6.1 Level-1B (L1B) products have been improved by undergoing various calibration changes that include: changes to the response-versus-scan angle (RVS) approach that affects reflectance bands for Aqua and Terra MODIS, corrections to adjust for the optical crosstalk in Terra MODIS infrared (IR) bands, and corrections to the Terra MODIS forward look-up table (LUT) update for the period 2012 - 2017. A polarization correction has been applied to the L1B Reflective Solar Bands (RSB). The product utilizes GEOS data replacing MERRA2. * Three new CMG products are available in the MxD21 suite (MxD21C1/C2/C3).
MODIS/Terra Land Surface Temperature/3-Band Emissivity 8-Day L3 Global 1km...
data.nasa.gov
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MODIS/Terra Land Surface Temperature/3-Band Emissivity 8-Day L3 Global 1km SIN Grid V006 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/modis-terra-land-surface-temperature-3-band-emissivity-8-day-l3-global-1km-sin-grid-v006
Explore at:
Dataset updated
Jun 12, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The MOD21A2 Version 6 data product was decommissioned on July 31, 2023. Users are encouraged to use the MOD21A2 Version 6.1 data product.A new suite of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature and Emissivity (LST&E) products are available in Collection 6. The MOD21 Land Surface Temperatuer (LST) algorithm differs from the algorithm of the MOD11 LST products, in that the MOD21 algorithm is based on the ASTER Temperature/Emissivity Separation (TES) technique, whereas the MOD11 uses the split-window technique. The MOD21 TES algorithm uses a physics-based algorithm to dynamically retrieve both the LST and spectral emissivity simultaneously from the MODIS thermal infrared bands 29, 31, and 32. The TES algorithm is combined with an improved Water Vapor Scaling (WVS) atmospheric correction scheme to stabilize the retrieval during very warm and humid conditions. The MOD21A2 dataset is an 8-day composite LST product that uses an algorithm based on a simple averaging method. The algorithm calculates the average from all the cloud free MOD21A1D and MOD21A1N daily acquisitions from the 8-day period. Unlike the MOD21A1 data sets where the daytime and nighttime acquisitions are separate products, the MOD21A2 contains both daytime and nighttime acquisitions as separate Science Dataset (SDS) layers within a single Hierarchical Data Format (HDF) file. The LST, Quality Control (QC), view zenith angle, and viewing time have separate day and night SDS layers, while the values for the MODIS emissivity bands 29, 31, and 32 are the average of both the nighttime and daytime acquisitions. MOD21A2 products are available two months after acquisition due to latency of data inputs. Additional details regarding the method used to create this Level 3 (L3) product are available in the Algorithm Theoretical Basis Document (ATBD).Known Issues Forward processing of Terra MODIS LST&E Version 6 data products was discontinued on December 31, 2005. Users are encouraged to use the MOD21A1D Version 6.1 data product. Users of MODIS LST products may notice an increase in occurrences of extreme high temperature outliers in the unfiltered MxD21 Version 6 and 6.1 products compared to the heritage MxD11 LST products. This can occur especially over desert regions like the Sahara where undetected cloud and dust can negatively impact both the MxD21 and MxD11 retrieval algorithms. * In the MxD11 LST products, these contaminated pixels are flagged in the algorithm and set to fill values in the output products based on differences in the band 32 and band 31 radiances used in the generalized split window algorithm. In the MxD21 LST products, values for the contaminated pixels are retained in the output products (and may result in overestimated temperatures), and users need to apply Quality Control (QC) filtering and other error analyses for filtering out bad values. High temperature outlier thresholds are not employed in MxD21 since it would potentially remove naturally occurring hot surface targets such as fires and lava flows. High atmospheric aerosol optical depth (AOD) caused by vast dust outbreaks in the Sahara and other deserts highlighted in the example documentation are the primary reason for high outlier surface temperature values (and corresponding low emissivity values) in the MxD21 LST products. Future versions of the MxD21 product will include a dust flag from the MODIS aerosol product and/or brightness temperature look up tables to filter out contaminated dust pixels. It should be noted that in the MxD11B day/night algorithm products, more advanced cloud filtering is employed in the multi-day products based on a temporal analysis of historical LST over cloudy areas. This may result in more stringent filtering of dust contaminated pixels in these products. * In order to mitigate the impact of dust in the MxD21 V6 and 6.1 products, the science team recommends using a combination of the existing QC bits, emissivity values, and estimated product errors, to confidently remove bad pixels from analysis. For more details, refer to this dust and cloud contamination example documentation. For complete information about known issues please refer to the MODIS/VIIRS Land Quality Assessment website.Improvements/Changes from Previous Versions* New product for MODIS Version 6.

Facebook

Twitter

Click to copy link

Link copied

Cite

David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002

Data from: Error and anomaly detection for intra-participant time-series data

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.5189002

Dataset updated

Jun 1, 2023

Dataset provided by

Taylor & Francis

Authors

David R. Mullineaux; Gareth Irwin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

Clear search

Close search

Google apps

Main menu

Data from: Error and anomaly detection for intra-participant time-series...

Data from: Methodology to filter out outliers in high spatial density data...

COVID-19 High Frequency Phone Survey of Households 2020, Round 2 - Viet Nam

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Malaria disease and grading system dataset from public hospitals reflecting...

Dataset for: FAG in buckwheat flowers and its possible biological relevance

Hydrochemistry analysis of the Galilee subregion

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

Data from: Classification of Heart Failure Using Machine Learning: A...

Tsetse fly wing landmark data for morphometrics (Vol 20, 21)

Data from: Interplay of physical and social drivers of movement in male...

MOOJa catalogs for Solar System Objects - Dataset - B2FIND

Dataset for the paper "Observation of Acceleration and Deceleration Periods...

Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam...

GGWS-PCNN: A global gridded wind speed dataset (1973/01-2021/12; Ongoing...

High frequency dataset for event-scale concentration-discharge analysis in a...

R code

Data from: Urbanev: An open benchmark dataset for urban electric vehicle...

MODIS/Terra Land Surface Temperature/3-Band Emissivity 8-Day L3 Global 1km...

Intermediate data for TE calculation

MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global...

MODIS/Terra Land Surface Temperature/3-Band Emissivity 8-Day L3 Global 1km...

Data from: Error and anomaly detection for intra-participant time-series data