This dataset was created by Damini Tiwari
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.looper.com/img/gallery/the-ending-of-squid-game-season-1-explained/intro-1632168234.jpg" alt="">
The dataset contains the recent tweets about the record-breaking Netflix show "Squid Game"
The data is collected using tweepy Python package to access Twitter API.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the first half of 2020, the COVID-19 pandemic changed the social gathering lifestyle to online business and social interaction. The worldwide imposed travel bans and national lockdown prevented social gatherings, making learning institutions and businesses to adopt an online platform for learning and business transactions. This development led to the incorporation of video conferencing into daily activities. This data article presents broadband data usage measurement collected using Glasswire software on various conference calls made between July and August. The services considered in this work are Google Meet, Zoom, Mixir, and Hangout. The data were recorded in Microsoft Excel 2016, running on a personal computer. The data was cleaned and processed using google colaboratory, which runs Python scripts on the browser. Exploratory data analysis is conducted on the data set using linear regression to model a predictive model to assess the best performance model that offers the best quality of service for online video and voice conferencing. The data is necessary to learning institutions using online programs and to learners accessing online programs in a smart city and developing countries. The data is presented in tables and graphs
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
On October 2nd, 2020, U.S. President Donald Trump tweeted that he and the First Lady of the United States (FLOTUS), Melania Trump, both tested positive for COVID-19 (https://twitter.com/realDonaldTrump/status/1311892190680014849). Within seconds, his tweet received thousands of replies. The dataset consists of 298,172 replies to Donald Trump’s tweet announcing his COVID-19 diagnosis, posted on October 2nd between 6am and 12:30pm (ET). It contains tweet ids, the toxicity scores (as calculated by Google's Perspective API via https://Communalytic.com) and tweet availability status values (via twarc library). Following Twitter’s API policy, we stripped metadata associated with each tweet. So, if you’d like to examine potential relationships between other metadata elements, you would need to recollect original tweets using tools like DocNow’s Hydrator first. The only downside of this approach is that tweets that have been blocked or deleted will not be recollected. To help you get started, we also shared our Exploratory Data Analysis (EDA) Python script at https://github.com/RUSocialMediaLab/toxicityanalysis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains data and code to replicate the findings presented in our paper titled " An Empirical Study on Exploratory Crowdtesting of Android Applications".
Abstract
Crowdtesting is an emerging paradigm in which a ``crowd'' of people is recruited to perform testing tasks on demand. It proved to be especially promising in the mobile apps domain and in combination with exploratory testing strategies, in which individual testers pursue a creative, experience-based approach to design tests.
Managing the crowdtesting process, however, is still a challenging task, that can easily result either in wasteful spending or in inadequate software quality, due to the unpredictability of remote testing activities.
A number of works in the literature investigated the application of crowdtesting in the mobile apps domain. These works, however, investigated crowdtesting effectiveness in finding bugs, and not in scenarios in which the goal is to generate a re-executable test suite, as well. Moreover, less work has been conducted on to the impact of different exploratory testing strategies in the crowdtesting process.
As a first step towards filling this gap in the literature, in this work we conduct an empirical evaluation involving four open-source Android apps and twenty masters students, that we believe can be representative of practitioners partaking in crowdtesting activities. The students were asked to generate test suites for the apps using a Capture and Replay tool and different exploratory testing strategies. We then compare the effectiveness, in terms of aggregate code coverage, that different-sized crowds of students using different exploratory testing strategies may achieve.
Results suggest that exploratory crowdtesting can be a valuable approach for generating GUI test suites for mobile apps, and provide a deeper insight on code coverage dynamics to project managers interested in using crowdtesting to test simple apps, on which they can make more informed decisions.
Contents and Instructions
This package contains:
This archive contains the model, the data and the analyses of my MSc Thesis. The model and analyses are coded in Python and documented in Jupyter Notebooks.
Thesis Title: Deep Uncertainty in Humanitarian Logistics: Simulation and Analysis of the Interplay between Decisions and Uncertainty for Post-Disaster Facility Location Decisions
Link to Thesis: http://resolver.tudelft.nl/uuid:580c5a9c-73dc-4195-930e-97c524a4b7c7
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset was created by Damini Tiwari