Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Simulated A/B Testing Data for Web User Engagement This dataset contains synthetically generated A/B testing data that mimics user behavior on a website with two versions: Control (con) and Experimental (exp). The dataset is designed for practicing data cleaning, statistical testing (e.g., Z-test, T-test), and pipeline development.
Each row represents an individual user session, with attributes capturing click behavior, session duration, access device, referral source, and timestamp.
Features: click — Binary (1 if clicked, 0 if not)
group — A/B group assignment (con or exp, with injected label inconsistencies)
session_time — Time spent in the session (in minutes), including outliers
click_time — Timestamp of user interaction (nullable)
device_type — Device used (mobile or desktop, mixed casing)
referral_source — Where the user came from (e.g., social, email, with some typos/whitespace)
Use Cases: A/B testing analysis (CTR, CVR)
Hypothesis testing (Z-test, T-test)
ETL pipeline design
Data cleaning and standardization practice
Dashboard creation and segmentation analysis
Notes: The dataset includes intentional inconsistencies (nulls, duplicates, casing issues, typos) to reflect real-world challenges.
Fully synthetic — safe for public use.
Facebook
TwitterThis dataset was created by Gyan Kumar
It contains the following files:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In many phenomena, data are collected on a large scale and at different frequencies. In this context, functional data analysis (FDA) has become an important statistical methodology for analyzing and modeling such data. The approach of FDA is to assume that data are continuous functions and that each continuous function is considered as a single observation. Thus, FDA deals with large-scale and complex data. However, visualization and exploratory data analysis, which are very important in practice, can be challenging due to the complexity of the continuous functions. Here we introduce a type of record concept for functional data, and we propose some nonparametric tools based on the record concept for functional data observed over time (functional time series). We study the properties of the trajectory of the number of record curves under different scenarios. Also, we propose a unit root test based on the number of records. The trajectory of the number of records over time and the unit root test can be used for visualization and exploratory data analysis. We illustrate the advantages of our proposal through a Monte Carlo simulation study. We also illustrate our method on two different datasets: Daily wind speed curves at Yanbu, Saudi Arabia and annual mortality rates in France. Overall, we can identify the type of functional time series being studied based on the number of record curves observed. Supplementary materials for this article are available online.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data is publicly available on GitHub here. It can be utilized for EDA, Statistical Analysis, and Visualizations.
The data set ifood_df.csv consists of 2206 customers of XYZ company with data on:
- Customer profiles
- Product preferences
- Campaign successes/failures
- Channel performance
I do not own this dataset. I am simply making it accessible on this platform via the public GitHub link.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Simulated user-aggregated data from an experiment with webpage views and button clicks attributes. Can be very useful for preparing for interviews and practicing statistical tests. The data was prepared using a special selection of parameters: success_rate, uplift, beta, skew
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Discover, test and benchmark 3rd-party AI models for drone-based crowd and traffic detection — accuracy, latency & rare-object performance for enterprise use.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data.
Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space.
|
File:
| Iris dataset: started in plot "https://pub.uni-bielefeld.de/download/2920448/2920454">(a)
at S0 (b) at S1
(c) at S2
10d noisy circle dataset: started in plot (c) at "https://pub.uni-bielefeld.de/download/2920448/2920451">S0 (mean) (d) at S1 (edge) 10d Gaussian: plot (d) started at S0 3 clusters: Example 1 3 clusters: invisible columns used as output variables: "https://pub.uni-bielefeld.de/download/2920448/2920450">Example 2 |
|
Description:
| Data Sonogram Sound examples for synthetic datasets and the Iris dataset |
|
Duration:
| about 5 s |
This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset.
The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Eukaryotes test, kingdom level: Distribution of Eukaryotes into four groups for every graph kernel.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains data and code to replicate the findings presented in our paper titled "Influence of the Number of Testers in Exploratory Crowd-Testing of Android Applications".
Abstract
Crowdtesting is an emerging paradigm in which a ``crowd'' of people independently carry out testing tasks, and proved to be especially promising in the mobile apps domain and in combination with exploratory testing strategies, in which individual testers pursue a creative, experience-based approach to design tests.
Managing the crowdtesting process, however, is still a challenging task, that can easily result either in wasteful spending or in inadequate software quality, due to the unpredictability of remote testing activities. A number of works in the literature investigated the application of crowdtesting in the mobile apps domain. These works, however, consider crowdtesting tasks in which the goal is to find bugs, and not to generate a re-executable test suite. Moreover, existing works do not consider the impact of the application of different exploratory testing strategies.
As a first step towards filling this gap in the literature, in this work, we conduct an empirical evaluation involving four open source Android apps and twenty masters students, that we believe can be representative of practitioners partaking in crowdtesting activities. The students were asked to generate test suites for the apps using a Capture and Replay tool and different exploratory testing strategies. We then compare the effectiveness, in terms of aggregate code coverage, that different-sized crowds of students achieve using different exploratory testing strategies. Results provide useful insights to project managers interested in using crowdtesting to produce GUI test suites for mobile apps, on which they can make more informed decisions.
Contents and Instructions
This package contains:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains data and code to replicate the findings presented in our paper titled "GUI Testing of Android Applications: Investigating the Impact of the Number of Testers on Different Exploratory Testing Strategies".
Abstract
Graphical User Interface (GUI) testing plays a pivotal role in ensuring the quality and functionality of mobile apps. In this context, Exploratory Testing (ET), a distinctive methodology in which individual testers pursue a creative, and experience-based approach to test design, is often used as an alternative or in addition to traditional scripted testing. Managing the exploratory testing process is a challenging task, that can easily result either in wasteful spending or in inadequate software quality, due to the relative unpredictability of exploratory testing activities, which depend on the skills and abilities of individual testers. A number of works have investigated the diversity of testers’ performance when using ET strategies, often in a crowdtesting setting. These works, however, investigated ET effectiveness in detecting bugs, and not in scenarios in which the goal is to generate a re-executable test suite, as well. Moreover, less work has been conducted on evaluating the impact of adopting different exploratory testing strategies. As a first step towards filling this gap in the literature, in this work we conduct an empirical evaluation involving four open-source Android apps and twenty masters students, that we believe can be representative of practitioners partaking in exploratory testing activities. The students were asked to generate test suites for the apps using a Capture and Replay tool and different exploratory testing strategies. We then compare the effectiveness, in terms of aggregate code coverage, that different-sized groups of students using different exploratory testing strategies may achieve. Results provide deeper insights into code coverage dynamics to project managers interested in using exploratory approaches to test simple Android apps, on which they can make more informed decisions.
Contents and Instructions
This package contains:
apps-under-test.zip A zip archive containing the source code of the four Android applications we considered in our study, namely MunchLife, TippyTipper, Trolly, and SimplyDo.
apps-under-test-instrumented.zip A zip archive containing the instrumented source code of the four Android applications we used to compute branch coverage.
students-test-suites.zip A zip archive containing the test suites developed by the students using Uninformed Exploratory Testing (referred to as "Black Box" in the subdirectories) and Informed Exploratory Testing (referred to as "White Box" in the subdirectories). This also includes coverage reports.
compute-coverage-unions.zip A zip archive containing Python scripts we developed to compute the aggregate LOC coverage of all possible subsets of students. The scripts have been tested on MS Windows. To compute the LOC coverage achieved by any possible subsets of testers using IET and UET strategies, run the analysisAndReport.py script. To compute the LOC coverage achieved by mixed crowds in which some testers use a U+IET approach and others use a UET approach, run the analysisAndReport_UET_IET_combinations_emma.py script.
branch-coverage-computation.zip A zip archive containing Python scripts we developed to compute the aggregate branch coverage of all considered subsets of students. The scripts have been tested on MS Windows. To compute the branch coverage achieved by any possible subsets of testers using UET and I+UET strategies, run the branch_coverage_analysis.py script. To compute the code coverage achieved by mixed crowds in which some testers use a U+IET approach and others use a UET approach, run the mixed_branch_coverage_analysis.py script.
data-analysis-scripts.zip A zip archive containing R scripts to merge and manipulate coverage data, to carry out statistical analysis and draw plots. All data concerning RQ1 and RQ2 is available as a ready-to-use R data frame in the ./data/all_coverage_data.rds file. All data concerning RQ3 is available in the ./data/all_mixed_coverage_data.rds file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Depression is a psychological state of mind that often influences a person in an unfavorable manner. While it can occur in people of all ages, students are especially vulnerable to it throughout their academic careers. Beginning in 2020, the COVID-19 epidemic caused major problems in people’s lives by driving them into quarantine and forcing them to be connected continually with mobile devices, such that mobile connectivity became the new norm during the pandemic and beyond. This situation is further accelerated for students as universities move towards a blended learning mode. In these circumstances, monitoring student mental health in terms of mobile and Internet connectivity is crucial for their wellbeing. This study focuses on students attending an International University of Bangladesh to investigate their mental health due to their continual use of mobile devices (e.g., smartphones, tablets, laptops etc.). A cross-sectional survey method was employed to collect data from 444 participants. Following the exploratory data analysis, eight machine learning (ML) algorithms were used to develop an automated normal-to-extreme severe depression identification and classification system. When the automated detection was incorporated with feature selection such as Chi-square test and Recursive Feature Elimination (RFE), about 3 to 5% increase in accuracy was observed by the method. Similarly, a 5 to 15% increase in accuracy has been observed when a feature extraction method such as Principal Component Analysis (PCA) was performed. Also, the SparsePCA feature extraction technique in combination with the CatBoost classifier showed the best results in terms of accuracy, F1-score, and ROC-AUC. The data analysis revealed no sign of depression in about 44% of the total participants. About 25% of students showed mild-to-moderate and 31% of students showed severe-to-extreme signs of depression. The results suggest that ML models, incorporating a proper feature engineering method can serve adequately in multi-stage depression detection among the students. This model might be utilized in other disciplines for detecting early signs of depression among people.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Whole dataset test: Evaluation of the SVM results for discerning Eukaryotes/Prokaryotes and Kingdoms.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset includes comprehensive data on students' exam results in three subjects: writing, reading, and math. A single student is represented by each record, which contains information about the student's gender, race and ethnicity, type of meal (reduced or free), parental education level, and whether or not the student took a test-prep course. These variables aid in the explanation of the potential relationships between various socioeconomic and personal characteristics and academic results. Each subject's exam results are included in numerical form in the dataset, enabling comparison and trend analysis of performance.
The purpose of this dataset is to analyze how different personal and socio-economic factors relate to student academic performance. By comparing test scores with variables like parental education level, test preparation, and lunch type, you can explore.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Depression is a psychological state of mind that often influences a person in an unfavorable manner. While it can occur in people of all ages, students are especially vulnerable to it throughout their academic careers. Beginning in 2020, the COVID-19 epidemic caused major problems in people’s lives by driving them into quarantine and forcing them to be connected continually with mobile devices, such that mobile connectivity became the new norm during the pandemic and beyond. This situation is further accelerated for students as universities move towards a blended learning mode. In these circumstances, monitoring student mental health in terms of mobile and Internet connectivity is crucial for their wellbeing. This study focuses on students attending an International University of Bangladesh to investigate their mental health due to their continual use of mobile devices (e.g., smartphones, tablets, laptops etc.). A cross-sectional survey method was employed to collect data from 444 participants. Following the exploratory data analysis, eight machine learning (ML) algorithms were used to develop an automated normal-to-extreme severe depression identification and classification system. When the automated detection was incorporated with feature selection such as Chi-square test and Recursive Feature Elimination (RFE), about 3 to 5% increase in accuracy was observed by the method. Similarly, a 5 to 15% increase in accuracy has been observed when a feature extraction method such as Principal Component Analysis (PCA) was performed. Also, the SparsePCA feature extraction technique in combination with the CatBoost classifier showed the best results in terms of accuracy, F1-score, and ROC-AUC. The data analysis revealed no sign of depression in about 44% of the total participants. About 25% of students showed mild-to-moderate and 31% of students showed severe-to-extreme signs of depression. The results suggest that ML models, incorporating a proper feature engineering method can serve adequately in multi-stage depression detection among the students. This model might be utilized in other disciplines for detecting early signs of depression among people.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Depression is a psychological state of mind that often influences a person in an unfavorable manner. While it can occur in people of all ages, students are especially vulnerable to it throughout their academic careers. Beginning in 2020, the COVID-19 epidemic caused major problems in people’s lives by driving them into quarantine and forcing them to be connected continually with mobile devices, such that mobile connectivity became the new norm during the pandemic and beyond. This situation is further accelerated for students as universities move towards a blended learning mode. In these circumstances, monitoring student mental health in terms of mobile and Internet connectivity is crucial for their wellbeing. This study focuses on students attending an International University of Bangladesh to investigate their mental health due to their continual use of mobile devices (e.g., smartphones, tablets, laptops etc.). A cross-sectional survey method was employed to collect data from 444 participants. Following the exploratory data analysis, eight machine learning (ML) algorithms were used to develop an automated normal-to-extreme severe depression identification and classification system. When the automated detection was incorporated with feature selection such as Chi-square test and Recursive Feature Elimination (RFE), about 3 to 5% increase in accuracy was observed by the method. Similarly, a 5 to 15% increase in accuracy has been observed when a feature extraction method such as Principal Component Analysis (PCA) was performed. Also, the SparsePCA feature extraction technique in combination with the CatBoost classifier showed the best results in terms of accuracy, F1-score, and ROC-AUC. The data analysis revealed no sign of depression in about 44% of the total participants. About 25% of students showed mild-to-moderate and 31% of students showed severe-to-extreme signs of depression. The results suggest that ML models, incorporating a proper feature engineering method can serve adequately in multi-stage depression detection among the students. This model might be utilized in other disciplines for detecting early signs of depression among people.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
An underlying assumption of proportional hazards models is that the effect of a change in a covariate on the hazard rate of event occurrence is constant over time. For scholars using the Cox model, a Schoenfeld residual-based test has become the disciplinary standard for detecting violations of this assumption. However, using this test requires researchers to make a choice about a transformation of the time scale. In practice, this choice has largely consisted of arbitrary decisions made without justification. Using replications and simulations, we demonstrate that the decision about time transformations can have profound implications for the conclusions reached. In particular, we show that researchers can make far more informed decisions by paying closer attention to the presence of outlier survival times and levels of censoring in their data. We suggest a new standard for best practices in Cox diagnostics that buttresses the current standard with in-depth exploratory data analysis.
Facebook
TwitterSince i have started research in the field of data science, i have noticed there are lot of data sets available for NLP, medicine, images and other subjects but i could not find any single adequate data for the domain of software testing. The data sets which are hardly available are extracted from some piece of code or some historical data that too not available publicly to analyze. The domain of software testing and data science, especially machine learning has a lot of potential. While conducting research on testcase prioritization especially in initial stages of software test cycle the way companies set the priorities in software industry there is no black box data set available in that format. This was the reason that i wanted such data set to exist. So i collected the necessary attributes , arrange them against their values and make one.
This data was gathered in [Aug, 2020], from a software company worked on a car financing lease company's whole software package from web to their management system. The dataset is in .csv format, there are 2000 rows and 6 columns in this data set. The detail of six attributes are as under: B_Req --> Business Requirement R_Prioirty --> Requirement Priority of particular business requirement FP --> Function point of each testing task, which in our case are test cases against each requirement under covers a particular FP Complexity --> Complexity of a particular function point or related modules(the description of assigning complexity is listed below in this section)* Time --> Estimated max time assigned to each Function Point of particular testing task by QA team lead or sr. SQA analyst Cost --> Calculated cost for each function point using complexity and time with function point estimation technique to calculates cost using the formula listed below: cost = “Cost = (Complexity * Time) * average amount set per task or per Function Point note: In this case it is set as 5$ per FP. The criteria for complexity is listed in .txt file attached with new version.
I would like to thank the persons from QA departments of different software companies. Especially team of the the company who provided me this estimation data and traceability matrix to extract data and compile these in to a dataset. I get a great help from the websites like www.softwaretestinghelp.com, www.coderus.com and many other sources which helps me to understand all the testing process and in which phases priorities are assigned usually.
My inspiration to collect this data is the shortage of dataset showing the priority of testcases with their requirements and estimated metrics to analyze the data while doing research in automation of testcase priority using machine learning. --> The dataset can be used to analyze and apply classification or any machine learning algorithm to prioritize testcases. --> Can be used reduce , select or automate testing based on priority, or cost and time or complexity and requirements. --> Can be used to build recommendation system problem related to software testing which helps software testing team to ease their task based estimation and recommendation.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Shell Buckling Knockdown Factor (SBKF) Project, NASA Engineering and Safety Center (NESC) Assessment #: 07-010-E, was established in March of 2007 by the NESC in collaboration with the former NASA Constellation Program (CxP) and now the Space Launch System (SLS) Program. The SBKF Project has the goal of developing and experimentally validating improved (i.e., less-conservative, more robust) analysis-based shell buckling design factors (a.k.a., knockdown factors (KDFs)) and developing design recommendations for launch vehicle structures.
Shell buckling knockdown factors have been historically based on test data from laboratory-scale test articles obtained from the 1930s through the 1960s. The knockdown factors are used to account for the differences observed between the theoretical buckling load and the buckling load obtained from test. However, these test-based KDFs may not be relevant for modern launch- vehicle designs, and are likely overly conservative for many designs. Significant advances in structural stability theory, high-fidelity analysis methods, manufacturing, and testing are enabling the development of new, less conservative, robust analysis-based knockdown factors for modern structural concepts. Preliminary design studies indicate that implementation of new knockdown factors can enable significant weight savings in these vehicles and will help mitigate some of NASA’s launch-vehicle development and performance risks, by reducing reliance on large-scale testing, and providing high-fidelity estimates of as-built structural performance, increased payload capability, and improved structural reliability.
To achieve its KDF development and implementation goals, the SBKF Project is engaged in several work areas including launch-vehicle design trade studies, subcomponent and component level design, analysis and structural testing, and shell buckling design technology development including analysis-method development, analysis benchmarking and standardization, and analysis-based KDF development. Finite-element analysis is used extensively in all these work areas. In particular, there are four main categories analyses conducted by SBKF and include:
1) high-fidelity structural simulations, 2) imperfection sensitivity studies, 3) test article design and analysis and 4) exploratory studies. Each of these types of analysis may have different analysis objectives and utilize different modeling approaches that depend on the results required
to meet the Project needs. A description of the four main categories follows.
High-fidelity structural simulations
High-fidelity structural simulations are defined as simulations that can predict accurately the complex behavior of a structural component or an assembly of components (e.g., virtual structural test) and often require a significant level of modeling detail and knowledge of the structural system (e.g., its physical behavior and expected variability). Models are considered high-fidelity if results predicted with these models correlate with test data to within a small range of variance and represent accurately the true physical behavior of the structure. The permissible amount of variance is determined based on the analysis requirements defined by the Project in accordance with the intended end use of the predicted data. High-fidelity shell buckling analysis objectives considered by the SBKF Project often require the accurate prediction of stiffnesses, local and global deformations, strains, load paths and buckling-induced load redistribution, and buckling and failure loads and modes. To achieve these analysis goals, the models typically must accurately represent loading and boundary conditions, and expected or measured geometric and material variations (imperfections). It is expected that high-fidelity models developed by SBKFwill predict effective axial stiffness (slope of the load versus end-shortening curve) within ±2%, buckling loads and point displacements (displacement measured at a point) within ±5%, and point strains within ±10%. However, if the displacements or strains of interest are in a high-gradient location, then the overall trend will be assessed for correlation.
Imperfection sensitivity studies
Imperfection sensitivity studies are used to assess the sensitivity of a structure’s nonlinear response and buckling load to initial imperfections, such as geometric imperfections (imperfections in the shell wall geometry including out-of-roundness or local dimples), and loading and material
non-uniformities. Geometric imperfections included in an analysis model can be based upon the measured geometry of test articles or flight hardware, or they can be defined analytically using eigenmode shapes or other perturbations. The SBKF Project is developing analysis-based SBKFs (KDFs) that are derived from imperfection sensitivity studies and several imperfection types are being investigated. First, a single dimple-shaped imperfection is being used as a “worst-expected” imperfection shape and is similar to the initial dimple that is observed in the shell wall at the onset of buckling. The dimple is created in the shell by applying a radially inward lateral load at the mid-length of the cylinder. The magnitude of the lateral load is held fixed and the active destabilizing load (e.g., axial compression) is then applied until buckling occurs in the shell. The magnitude of the lateral load is increased incrementally in subsequent buckling analyses until a minimum or lower-bound buckling load is achieved. A second imperfection type used includes actual measured geometry data from as-built launch-vehicle-like test articles and flight hardware. These measured geometric imperfections are included in the model by adjusting the original geometrically perfect finite-element mesh nodal coordinates to the perturbed imperfect geometry. Finally, the effects of loading imperfections are investigated by applying localized concentrated loads on the ends of the shell in combination with the geometric imperfections or separately. Loading imperfections can occur due to manufacturing/machining variabilities and/or fit- up mismatch at component interfaces.
Test article design and analysis
Test article design and analysis encompass unique requirements that differ significantly from those associated with the design of aircraft, spacecraft, or launch-vehicle structures. Aerospace structures are designed and evaluated to ensure that they are able to sustain the required loads, but they are not typically required to exhibit a specific controlling or critical failure mode (i.e., they are not typically designed such that a specific failure mode has the minimum design margin). In contrast, test articles used in the SBKF Project are designed and evaluated to ensure that a particular failure mechanism is exhibited during a test so that the resulting test data may be used to validate modeling and analysis methods
for predicting specific behaviors. In addition, the test articles are typically designed such that they lie within the same design space as the full-scale structure they represent and exhibit similar response characteristics.
Exploratory studies
Exploratory studies are typically quick assessments used to guide future detailed analysis tasks. Data from these exploratory studies are not intended for future use or as decisional data and are often only used by the analyst to make informed decisions on the direction of future work. Thus, rigorous quality control and reporting of these analysis studies is typically not required.
The specific class of analysis and corresponding analysis and data requirements shall be determined by the SBKF team leads and the analyst. The analysis approach shall be based on standard best practices, when possible, and shall be uniform across all related analysis activities to ensure consistency. However, deviations from standard practice may be required and/or new approaches may be necessary to meet the analysis objectives. In such circumstances, the analyst and team lead will work together to develop and validate any new approach required.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Documentation: Cucumber Disease Detection
Introduction: A machine learning model for the automatic detection of diseases in cucumber plants is to be developed as part of the "Cucumber Disease Detection" project. This research is crucial because it tackles the issue of early disease identification in agriculture, which can increase crop yield and cut down on financial losses. To train and test the model, we use a dataset of pictures of cucumber plants.
Importance: Early disease diagnosis helps minimize crop losses, stop the spread of diseases, and better allocate resources in farming. Agriculture is a real-world application of this concept.
Goals and Objectives: Develop a machine learning model to classify cucumber plant images into healthy and diseased categories. Achieve a high level of accuracy in disease detection. Provide a tool for farmers to detect diseases early and take appropriate action.
Data Collection: Using cameras and smartphones, images from agricultural areas were gathered.
Data Preprocessing: Data cleaning to remove irrelevant or corrupted images. Handling missing values, if any, in the dataset. Removing outliers that may negatively impact model training. Data augmentation techniques applied to increase dataset diversity.
Exploratory Data Analysis (EDA) The dataset was examined using visuals like scatter plots and histograms. The data was examined for patterns, trends, and correlations. Understanding the distribution of photos of healthy and ill plants was made easier by EDA.
Methodology Machine Learning Algorithms:
Convolutional Neural Networks (CNNs) were chosen for image classification due to their effectiveness in handling image data. Transfer learning using pre-trained models such as ResNet or MobileNet may be considered. Train-Test Split:
The dataset was split into training and testing sets with a suitable ratio. Cross-validation may be used to assess model performance robustly.
Model Development The CNN model's architecture consists of layers, units, and activation operations. On the basis of experimentation, hyperparameters including learning rate, batch size, and optimizer were chosen. To avoid overfitting, regularization methods like dropout and L2 regularization were used.
Model Training During training, the model was fed the prepared dataset across a number of epochs. The loss function was minimized using an optimization method. To ensure convergence, early halting and model checkpoints were used.
Model Evaluation Evaluation Metrics:
Accuracy, precision, recall, F1-score, and confusion matrix were used to assess model performance. Results were computed for both training and test datasets. Performance Discussion:
The model's performance was analyzed in the context of disease detection in cucumber plants. Strengths and weaknesses of the model were identified.
Results and Discussion Key project findings include model performance and disease detection precision. a comparison of the many models employed, showing the benefits and drawbacks of each. challenges that were faced throughout the project and the methods used to solve them.
Conclusion recap of the project's key learnings. the project's importance to early disease detection in agriculture should be highlighted. Future enhancements and potential research directions are suggested.
References Library: Pillow,Roboflow,YELO,Sklearn,matplotlib Datasets:https://data.mendeley.com/datasets/y6d3z6f8z9/1
Code Repository https://universe.roboflow.com/hakuna-matata/cdd-g8a6g
Rafiur Rahman Rafit EWU 2018-3-60-111
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Simulated A/B Testing Data for Web User Engagement This dataset contains synthetically generated A/B testing data that mimics user behavior on a website with two versions: Control (con) and Experimental (exp). The dataset is designed for practicing data cleaning, statistical testing (e.g., Z-test, T-test), and pipeline development.
Each row represents an individual user session, with attributes capturing click behavior, session duration, access device, referral source, and timestamp.
Features: click — Binary (1 if clicked, 0 if not)
group — A/B group assignment (con or exp, with injected label inconsistencies)
session_time — Time spent in the session (in minutes), including outliers
click_time — Timestamp of user interaction (nullable)
device_type — Device used (mobile or desktop, mixed casing)
referral_source — Where the user came from (e.g., social, email, with some typos/whitespace)
Use Cases: A/B testing analysis (CTR, CVR)
Hypothesis testing (Z-test, T-test)
ETL pipeline design
Data cleaning and standardization practice
Dashboard creation and segmentation analysis
Notes: The dataset includes intentional inconsistencies (nulls, duplicates, casing issues, typos) to reflect real-world challenges.
Fully synthetic — safe for public use.