Facebook
TwitterA common problem in clinical trials is the missing data that occurs when patients do not complete the study and drop out without further measurements. Missing data cause the usual statistical analysis of complete or all available data to be subject to bias. There are no universally applicable methods for handling missing data. We recommend the following: (1) Report reasons for dropouts and proportions for each treatment group; (2) Conduct sensitivity analyses to encompass different scenarios of assumptions and discuss consistency or discrepancy among them; (3) Pay attention to minimize the chance of dropouts at the design stage and during trial monitoring; (4) Collect post-dropout data on the primary endpoints, if at all possible; and (5) Consider the dropout event itself an important endpoint in studies with many.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Problem is a dataset for object detection tasks - it contains Problem annotations for 2,923 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThis statistic displays the results of a survey on the share of individuals expressing privacy concerns regarding their personal data on the internet in Italy in 2016. During the survey period, it was found that **** percent of the respondents reported that the use of the internet exposes each one to be tracked and followed up while **** percent stated that privacy was not a real problem.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains information about a number of participants (participants.csv) to a workshop that need to be assigned to a number of rooms (rooms.csv).
Restrictions: 1.- The workshop has 5 different activities 2.- Each participant has indicated their first, second and third preferences for the activities available (Priority1, Priority2 and Priority3 columns in participants.csv) 3.- Participants are part of teams (Team column in participant.csv) and should be assigned together 4.- Each Activity lasts for half a day, and each participant will take part in one activity in the morning and one activity in the afternoon. 5.- Each Room must contain the SAME activity in the morning and in the afternoon.
Requirements A.- Define the way i which each participant should be assigned through a csv file in the format Name;ActivityAM;RoomAM, ActivityPM;RoomPM B.- Maximize the number of people getting their 1st and 2nd preferences.
Facebook
TwitterThis statistic shows the countries where American and European organizations face regulatory challenges involving cross-border data issues in 2019. During the survey, ** percent of respondents mentioned they faced a challenge involving cross-border data issues in the United States.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.
The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.
This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.
The following is the Google Colab link to the project, done on Jupyter Notebook -
https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN
The following is the GitHub Repository of the project -
https://github.com/daerkns/social-media-and-mental-health
Libraries used for the Project -
Pandas
Numpy
Matplotlib
Seaborn
Sci-kit Learn
Facebook
TwitterThis dataset is inspired by a real story of something funny that happened to a data science team in a company. They had built a classifier that was suspiciously good, even when analyzing its performance on unseen validation data. It was too good to be true.
Your job is to find the error.
The following notebook explains the task:
https://www.kaggle.com/computingschool/shady-dataset-find-the-error
Facebook
TwitterThe ICOW project is currently collecting data on territorial issues in all regions of the world since 1816, compiled into several related data files. The first file identifies territorial claims, or explicit claims by official government representatives of at least two sovereign nation-states to the same piece of territory, and includes basic claim-level information such as the overall beginning and ending of the claim and the form of claim termination. The second file is organized by claim-dyad-years and includes one data point for each year of each claimant dyad, with information on details such as the characteristics of the claimed territory. The third and final data file covers attempts to settle these territorial claims through bilateral negotiations or with third party assistance (good offices, mediation, inquiry, conciliation, arbitration, or adjudication), and includes details such as the beginning, ending, and effectiveness of each settlement attempt.
Facebook
TwitterOver 8 million GitHub issue titles and descriptions from 2017. Prepared from instructions at How To Create Data Products That Are Magical Using Sequence-to-Sequence Models.
The data was adapted from GitHub data accessible from GitHub Archive. The constructocat image is from https://octodex.github.com/constructocat-v2.
MIT License
Copyright (c) 2018 David Shinn
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Facebook
TwitterThis dataset was created by Ramon
Facebook
TwitterVA Executive's Guide to Configuration Management
Facebook
TwitterThe New Security Issues, State and Local Governments tables (1.45) are updated monthly. Data were previously published in the Supplement to the Federal Reserve Bulletin, which ceased publication in December 2008. Data sources have included: Mergent, beginning November 2011; Securities Data Company, from January 1990 to October 2011; and Investment Dealers Digest before then.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Jira is an issue tracking system that supports software companies (among other types of companies) with managing their projects, community, and processes. This dataset is a collection of public Jira repositories downloaded from the internet using the Jira API V2. We collected data from 16 pubic Jira repositories containing 1822 projects and 2.7 million issues. Included in this data are historical records of 32 million changes, 8 million comments, and 1 million issue links that connect the issues in complex ways. This artefact repository contains the data as a MongoDB dump, the scripts used to download the data, the scripts used to interpret the data, and qualitative work conducted to make the data more approachable.
Facebook
TwitterSimple data created for practicing regression problems. Consist of three columns: Price , Feature 1 and Feature 2. Try to predict price using feature1 and feature2.The data is clean and data cleaning is not required.
Facebook
TwitterAs the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. Although online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we study a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and stores probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose two heuristic aggregations to speed up the iterative Expectation-Maximization (EM) algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experimental results show that these heuristic aggregations are much faster than the baseline method of computing each topic cube from scratch. We also discuss some potential uses of topic cube and show sample experimental results.
Facebook
TwitterThis table shows the 10 most frequently recorded incident problem types as recorded by communications personnel for each fiscal year presented.
Facebook
TwitterNovember's Institute of Medicine (IOM) report on medical errors has sparked debate among US health policy makers as to the appropriate response to the problem. Proposals range from the implementation of nationwide mandatory reporting with public release of performance data to voluntary reporting and quality-assurance efforts that protect the confidentiality of error-related data. Any successful safety program will require a national effort to make significant investments in information technology infrastructure, and to provide an environment and education that enables providers to contribute to an active quality-improvement process.
Facebook
TwitterHousing Problems by Type of Issue and Community, 2019
Facebook
TwitterIn 2021, ** percent of respondents from Canada indicated that it takes their company less than a week to resolve issues created by a data breach. Because data breaches are expensive, companies need to be able to resolves these issues quickly. For this reason, many companies have an incident response plan.
Facebook
TwitterProvide information on problem tracking, assignment and notification, alerts and escalation, problem resolution, and problem closure.
Facebook
TwitterA common problem in clinical trials is the missing data that occurs when patients do not complete the study and drop out without further measurements. Missing data cause the usual statistical analysis of complete or all available data to be subject to bias. There are no universally applicable methods for handling missing data. We recommend the following: (1) Report reasons for dropouts and proportions for each treatment group; (2) Conduct sensitivity analyses to encompass different scenarios of assumptions and discuss consistency or discrepancy among them; (3) Pay attention to minimize the chance of dropouts at the design stage and during trial monitoring; (4) Collect post-dropout data on the primary endpoints, if at all possible; and (5) Consider the dropout event itself an important endpoint in studies with many.