Facebook
TwitterThis paper introduces the sparse functional boxplot and the intensity sparse functional boxplot as practical exploratory tools. Besides being available for complete functional data, they can be used in sparse univariate and multivariate functional data. The sparse functional boxplot, based on the functional boxplot, displays sparseness proportions within the 50% central region. The intensity sparse functional boxplot indicates the relative intensity of fitted sparse point patterns in the central region. The two-stage functional boxplot, which derives from the functional boxplot to detect outliers, is furthermore extended to its sparse form. We also contribute to sparse data fitting improvement and sparse multivariate functional data depth. In a simulation study, we evaluate the goodness of data fitting, several depth proposals for sparse multivariate functional data, and compare the results of outlier detection between the sparse functional boxplot and its two-stage version. The practical applications of the sparse functional boxplot and intensity sparse functional boxplot are illustrated with two public health datasets. Supplementary materials and codes are available for readers to apply our visualization tools and replicate the analysis.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Florida Flood Hub for Applied Research and Innovation and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 242 National Oceanic and Atmospheric Administration (NOAA) Atlas 14 stations in Florida. The change factors were computed as the ratio of projected future to historical extreme-precipitation depths fitted to extreme-precipitation data from downscaled climate datasets using a constrained maximum likelihood (CML) approach as described in https://doi.org/10.3133/sir20225093. The change factors correspond to the periods 2020-59 (centered in the year 2040) and 2050-89 (centered in the year 2070) as compared to the 1966-2005 historical period.
An R script (create_boxplot.R) is provided which generates boxplots of change factors for a NOAA Atlas 14 station, or for all NOAA Atlas 14 stations in a Florida HUC-8 basin or county for durations of interest (1, 3, and 7 days, or combinations thereof) ...
Facebook
TwitterThe South Florida Water Management District (SFWMD) and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 174 National Oceanic and Atmospheric Administration (NOAA) Atlas 14 stations in central and south Florida. The change factors were computed as the ratio of projected future to historical extreme precipitation depths fitted to extreme precipitation data from various downscaled climate datasets using a constrained maximum likelihood (CML) approach. The change factors correspond to the period 2050-2089 (centered in the year 2070) as compared to the 1966-2005 historical period. An R script (create_boxplot.R) is provided which generates boxplots of change factors for a NOAA Atlas 14 station, or for all NOAA Atlas 14 stations in an ArcHydro Enhanced Database (AHED) basin or county for durations of interest (1, 3, and 7 days, or combinations thereof) and return periods of interest (5, 10, 25, 50, 100, and 200 years, or combinations thereof). The user also has the option of requesting that the script save the raw change factor data used to generate the boxplots, as well as the processed quantile and outlier data shown in the figure. The script allows the user to modify the percentiles used in generating the boxplots. A Microsoft Word file documenting code usage and available options is also provided within this data release (Documentation_R_script_create_boxplot.docx). As described in the documentation, the R script relies on some of the Microsoft Excel spreadsheets published as part of this data release.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the prioritization provided by a panel of 15 experts to a set of 28 barriers categories for 8 different roles of the future energy system. A Delphi method was followed and the scores provided in the three rounds carried out are included. The dataset also contains the scripts used to assess the results and the output of this assessment. A list of the information contained in this file is: data folder: this folders includes the scores given by the 15 experts in the 3 rounds. Every round is in an individual folder. There is a file per expert that has the scores between -5 (not relevant at all) to 5 (completely relevant) per barrier (rows) and actor (columns). There is also a file with the description of the experts in terms of their position in the company, the type of company and the country. fig folder: this folder includes the figures created to assess the information provided by the experts. For each round, the following figures are created (in each respective folder): Boxplot with the distribution of scores per barriers and roles. Heatmap with the mean scores per barriers and roles. Boxplots with the comparison of the different distributions provided by the experts of each group (depending on the keywords) per barrier and role. Heatmap with the mean score per barrier and use case and with the prioritization per barrier and use case. Finally, bar plots with the mean scores differences between rounds and boxplot with comparisons of the scores distributions are also provided. stat folder: this folder includes the files with the results of the different statistical assessment carried out. For each round, the following figures are created (in each respective folder): The statistics used to assess the scores (Intraclass correlation coefficient, Inter-rater agreement, Inter-rater agreement p-value, Homogeneity of Variances, Average interquartile range, Standard Deviation of interquartile ranges, Friedman test p-value Average power post hoc) per barrier and per role. The results of the post hoc of the Friedman Test per berries and per roles. The average score per barrier and per role. The mean value of the scores provided by the experts grouped by the keywords per barrier and role. P-value of the comparison of these two values. The end prioritization of the barrier for the use case (averaging the scores or merging the critical sets) Finally, the differences between the mean and standard deviations of the scores between two consecutive rounds are provided.
Facebook
TwitterA map containing comparisons of the predicted biodiversity among the three assemblages, using 13 boxplots for each of the bioregions within the study region.
Facebook
TwitterThe U.S. Geological Survey (USGS), in cooperation with the Missouri Department of Natural Resources (MDNR), collects data pertaining to the surface-water resources of Missouri. These data are collected as part of the Missouri Ambient Water-Quality Monitoring Network (AWQMN) and are stored and maintained by the USGS National Water Information System (NWIS) database. These data constitute a valuable source of reliable, impartial, and timely information for developing an improved understanding of the water resources of the State. Water-quality data collected between 1993 and 2017 were analyzed for long term trends and the network was investigated to identify data gaps or redundant data to assist MDNR on how to optimize the network in the future. This is a companion data release product to the Scientific Investigation Report: Richards, J.M., and Barr, M.N., 2021, General water-quality conditions, long-term trends, and network analysis at selected sites within the Ambient Water-Quality Monitoring Network in Missouri, water years 1993–2017: U.S. Geological Survey Scientific Investigations Report 2021–5079, 75 p., https://doi.org/10.3133/sir20215079. The following selected graphics are included in this data release in .pdf format. Also included in this data release are web pages accessible for people with disabilities provided in compressed .zip format. The web pages present the same information as the .pdf files: Annual and seasonal discharge trends.pdf -- Graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Annual_and_seasonal_discharge_trends_htm.zip -- Compressed web page presenting graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of simulated quarterly sampling frequency trends.pdf -- Graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics_of_simulated_quarterly_sampling_frequency_trends_htm.zip -- Compressed web page presenting graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of median parameter values.pdf -- Graphics of median values for selected parameters at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics_of_median_parameter_values_htm.zip -- Compressed web page presenting graphics of median values for selected parameters at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter value versus time.pdf -- Scatter plots of the value of selected parameters versus time at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter_value_versus_time_htm.zip -- Compressed web page presenting scatter plots of the value of selected parameters versus time at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter value versus discharge.pdf -- Scatter plots of the value of selected parameters versus discharge at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter_value_versus_discharge_htm.zip -- Compressed web page presenting scatter plots of the value of selected parameters versus discharge at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot of parameter value distribution by season.pdf -- Seasonal boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Seasons defined as Winter (December, January, and February), Spring (March, April, and May), Summer (June, July, and August), and Fall (September, October, and November). Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot_of_parameter_value_distribution_by_season_htm.zip -- Compressed web page presenting seasonal boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Seasons defined as Winter (December, January, and February), Spring (March, April, and May), Summer (June, July, and August), and Fall (September, October, and November). Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot of sampled discharge compared with mean daily discharge.pdf -- Boxplots of the distribution of discharge collected at the time of sampling of selected parameters compared with the period of record discharge distribution from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot_of_sampled_discharge_compared_with_mean_daily_discharge_htm.zip -- Compressed web page presenting boxplots of the distribution of discharge collected at the time of sampling of selected parameters compared with the period of record discharge distribution from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot of parameter value distribution by month.pdf -- Monthly boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot_of_parameter_value_distribution_by_month_htm.zip -- Compressed web page presenting monthly boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Three boxplots comparing phenotypic trait measures between populations; comparisons correspond to tests 1–3 as shown in Fig. 1 in text.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For each locus, the plots illustrate the distributions of (from top to bottom) per-position entropy, per-position gap score [4], per position conservation score [4], sequence length and GC content. 1. Kawahara AY, Breinholt JW. Phylogenomics provides strong evidence for relationships of butterflies and moths. Proc R Soc B. 2014;281: 20140970. 2. Robinson DF, Foulds LR. Comparison of phylogenetic trees. Math Biosci. 1981;53: 131–147. 3. Kuhner MK, Felsenstein J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol. 1994;11: 459–468. 4. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25: 1972–1973.
Facebook
TwitterBank has multiple banking products that it sells to customer such as saving account, credit cards, investments etc. It wants to which customer will purchase its credit cards. For the same it has various kind of information regarding the demographic details of the customer, their banking behavior etc. Once it can predict the chances that customer will purchase a product, it wants to use the same to make pre-payment to the authors.
In this part I will demonstrate how to build a model, to predict which clients will subscribing to a term deposit, with inception of machine learning. In the first part we will deal with the description and visualization of the analysed data, and in the second we will go to data classification models.
-Desire target -Data Understanding -Preprocessing Data -Machine learning Model -Prediction -Comparing Results
Predict if a client will subscribe (yes/no) to a term deposit — this is defined as a classification problem.
The dataset (Assignment-2_data.csv) used in this assignment contains bank customers’ data. File name: Assignment-2_Data File format: . csv Numbers of Row: 45212 Numbers of Attributes: 17 non- empty conditional attributes attributes and one decision attribute.
https://user-images.githubusercontent.com/91852182/143783430-eafd25b0-6d40-40b8-ac5b-1c4f67ca9e02.png">
https://user-images.githubusercontent.com/91852182/143783451-3e49b817-29a6-4108-b597-ce35897dda4a.png">
Data pre-processing is a main step in Machine Learning as the useful information which can be derived it from data set directly affects the model quality so it is extremely important to do at least necessary preprocess for our data before feeding it into our model.
In this assignment, we are going to utilize python to develop a predictive machine learning model. First, we will import some important and necessary libraries.
Below we are can see that there are various numerical and categorical columns. The most important column here is y, which is the output variable (desired target): this will tell us if the client subscribed to a term deposit(binary: ‘yes’,’no’).
https://user-images.githubusercontent.com/91852182/143783456-78c22016-149b-4218-a4a5-765ca348f069.png">
We must to check missing values in our dataset if we do have any and do, we have any duplicated values or not.
https://user-images.githubusercontent.com/91852182/143783471-a8656640-ec57-4f38-8905-35ef6f3e7f30.png">
We can see that in 'age' 9 missing values and 'balance' as well 3 values missed. In this case based that our dataset it has around 45k row I will remove them from dataset. on Pic 1 and 2 you will see before and after.
https://user-images.githubusercontent.com/91852182/143783474-b3898011-98e3-43c8-bd06-2cfcde714694.png">
From the above analysis we can see that only 5289 people out of 45200 have subscribed which is roughly 12%. We can see that our dataset highly unbalanced. we need to take it as a note.
https://user-images.githubusercontent.com/91852182/143783534-a05020a8-611d-4da1-98cf-4fec811cb5d8.png">
Our list of categorical variables.
https://user-images.githubusercontent.com/91852182/143783542-d40006cd-4086-4707-a683-f654a8cb2205.png">
Our list of numerical variables.
https://user-images.githubusercontent.com/91852182/143783551-6b220f99-2c4d-47d0-90ab-18ede42a4ae5.png">
In above boxplot we can see that some point in very young age and as well impossible age. So,
https://user-images.githubusercontent.com/91852182/143783564-ad0e2a27-5df5-4e04-b5d7-6d218cabd405.png">
https://user-images.githubusercontent.com/91852182/143783589-5abf0a0b-8bab-4192-98c8-d2e04f32a5c5.png">
Now, we don’t have issues on this feature so we can use it
https://user-images.githubusercontent.com/91852182/143783599-5205eddb-a0f5-446d-9f45-cc1adbfcce67.png">
https://user-images.githubusercontent.com/91852182/143783601-e520d59c-3b21-4627-a9bb-cac06f415a1e.png">
https://user-images.githubusercontent.com/91852182/143783634-03e5a584-a6fb-4bcb-8dc5-1f3cc50f9507.png">
https://user-images.githubusercontent.com/91852182/143783640-f6e71323-abbe-49c1-9935-35ffb2d10569.png">
This attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Boxplots comparing Bray-Curtis dissimilarity distances for sites sampled in both time periods, presented separately for low-impacted sites and urbanized sites.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Boxplot showing the differences between sexes regarding the logSVL (A) and log Weight (B) across the different types of fire regimes – areas burned in 2016 (BU16), burned in 2022 (BU22) and unburned areas (UN).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis paper introduces the sparse functional boxplot and the intensity sparse functional boxplot as practical exploratory tools. Besides being available for complete functional data, they can be used in sparse univariate and multivariate functional data. The sparse functional boxplot, based on the functional boxplot, displays sparseness proportions within the 50% central region. The intensity sparse functional boxplot indicates the relative intensity of fitted sparse point patterns in the central region. The two-stage functional boxplot, which derives from the functional boxplot to detect outliers, is furthermore extended to its sparse form. We also contribute to sparse data fitting improvement and sparse multivariate functional data depth. In a simulation study, we evaluate the goodness of data fitting, several depth proposals for sparse multivariate functional data, and compare the results of outlier detection between the sparse functional boxplot and its two-stage version. The practical applications of the sparse functional boxplot and intensity sparse functional boxplot are illustrated with two public health datasets. Supplementary materials and codes are available for readers to apply our visualization tools and replicate the analysis.