75 datasets found
  1. Bellabeat Case Study Supplement

    • kaggle.com
    zip
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Britta Smith (2022). Bellabeat Case Study Supplement [Dataset]. https://www.kaggle.com/datasets/brittasmith/bellabeat-casestudy-sql-tableau-excel
    Explore at:
    zip(65670 bytes)Available download formats
    Dataset updated
    Oct 28, 2022
    Authors
    Britta Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data, clean data, and SQL query output tables as spreadsheets to support Tableau story and github repository available at https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau

  2. s

    Unlocking Data to Inform Public Health Policy and Practice: WP1 Mapping...

    • orda.shef.ac.uk
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Clowes; Anthea Sutton; Tony Stone; Matthew Franklin (2023). Unlocking Data to Inform Public Health Policy and Practice: WP1 Mapping Review Supplementary Excel S1 [Dataset]. http://doi.org/10.15131/shef.data.21222272.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    The University of Sheffield
    Authors
    Mark Clowes; Anthea Sutton; Tony Stone; Matthew Franklin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Unlocking Data to Inform Public Health Policy and Practice: WP1 Mapping Review Supplementary Excel S1
    The data extracted into Excel Tab "S1 Case studies (extracted)" represents information from 31 case studies as part of the "Unlocking Data to Inform Public Health Policy and Practice" project, Workpackage (WP) 1 Mapping Review. Details about the WP1 mapping review can be found in the "Unlocking Data to Inform Public Health Policy and Practice" project report, which can be found via this DOI link: https://doi.org/10.15131/shef.data.21221606

  3. Google Certificate BellaBeats Capstone Project

    • kaggle.com
    zip
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
    Explore at:
    zip(169161 bytes)Available download formats
    Dataset updated
    Jan 5, 2023
    Authors
    Jason Porzelius
    Description

    Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

    Section 1 - Ask:

    A. Guiding Questions:
    1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

    B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

    1. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

    Section 2 - Prepare:

    A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

    B. Key Tasks:

    1. Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
      *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

    2. Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...

  4. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  5. BHIQDAVE CASE STUDY BIKE

    • kaggle.com
    zip
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhiqdave (2025). BHIQDAVE CASE STUDY BIKE [Dataset]. https://www.kaggle.com/datasets/bhiqdave/bhiqdave-case-study-bike
    Explore at:
    zip(41841662 bytes)Available download formats
    Dataset updated
    Aug 11, 2025
    Authors
    Bhiqdave
    Description

    Collected Data from Coursera data case study, topic 8 steps followed were both from the curse and YouTube video tutors from tutors like. google data analytics professional certificate capstone Case Study in Excel by matt bratting

  6. Bellabeat case study with excel and tableau

    • kaggle.com
    zip
    Updated Oct 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mgbecheta paschal (2023). Bellabeat case study with excel and tableau [Dataset]. https://www.kaggle.com/datasets/mgbechetapaschal/bellabeat-case-study-with-excel-and-tableau
    Explore at:
    zip(438640 bytes)Available download formats
    Dataset updated
    Oct 27, 2023
    Authors
    mgbecheta paschal
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by mgbecheta paschal

    Released under Apache 2.0

    Contents

  7. i

    Data related to the case study of the AI Agent integrated in the databases...

    • ieee-dataport.org
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enrique Barra Arias (2025). Data related to the case study of the AI Agent integrated in the databases educational escape room [Dataset]. https://ieee-dataport.org/documents/data-related-case-study-ai-agent-integrated-databases-educational-escape-room
    Explore at:
    Dataset updated
    Oct 30, 2025
    Authors
    Enrique Barra Arias
    Description

    Dataset related to the case study of the AI Agent integrated in the databases educational escape room.One excel file with the pre and post-test grades.One excel file with the survey results.

  8. Bellabeat Case Study with Excel and Tableau

    • kaggle.com
    zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Okorigwe Clinton (2024). Bellabeat Case Study with Excel and Tableau [Dataset]. https://www.kaggle.com/datasets/okorigweclinton/bellabeat-case-study-with-excel-and-tableau/code
    Explore at:
    zip(3095805 bytes)Available download formats
    Dataset updated
    Jul 12, 2024
    Authors
    Okorigwe Clinton
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Okorigwe Clinton

    Released under Apache 2.0

    Contents

  9. Axshya SAMVAD Study - Costs due to TB diagnosis

    • figshare.com
    xlsx
    Updated Nov 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hemant Deepak Shewade (2020). Axshya SAMVAD Study - Costs due to TB diagnosis [Dataset]. http://doi.org/10.6084/m9.figshare.13259687.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 19, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Hemant Deepak Shewade
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This contains the dataset (in MS Excel format) and codebook for the Axshya SAMVAD study asessing the effects of active case finding for TB on costs due to TB diagnosis and catastrophic costs due to TB diagnosis (prevelance, intensity and inequity) when compared to passive case finding. The study was published in Global Health Action in 2018

  10. Excel file figs 2, 3, 4, 5, 6, 7 and 10.

    • plos.figshare.com
    xlsx
    Updated Jun 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossein Bibak; Farzad Heydari; Mohammad Sadat-Hosseini (2024). Excel file figs 2, 3, 4, 5, 6, 7 and 10. [Dataset]. http://doi.org/10.1371/journal.pone.0303229.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hossein Bibak; Farzad Heydari; Mohammad Sadat-Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The present study recorded indigenous knowledge of medicinal plants in Shahrbabak, Iran. We described a method using data mining algorithms to predict medicinal plants’ mode of application. Twenty-oneindividuals aged 28 to 81 were interviewed. Firstly, data were collected and analyzed based on quantitative indices such as the informant consensus factor (ICF), the cultural importance index (CI), and the relative frequency of citation (RFC). Secondly, the data was classified by support vector machines, J48 decision trees, neural networks, and logistic regression. So, 141 medicinal plants from 43 botanical families were documented. Lamiaceae, with 18 species, was the dominant family among plants, and plant leaves were most frequently used for medicinal purposes. The decoction was the most commonly used preparation method (56%), and therophytes were the most dominant (48.93%) among plants. Regarding the RFC index, the most important species are Adiantum capillus-veneris L. and Plantago ovata Forssk., while Artemisia auseri Boiss. ranked first based on the CI index. The ICF index demonstrated that metabolic disorders are the most common problems among plants in the Shahrbabak region. Finally, the J48 decision tree algorithm consistently outperforms other methods, achieving 95% accuracy in 10-fold cross-validation and 70–30 data split scenarios. The developed model detects with maximum accuracy how to consume medicinal plants.

  11. BellaBeat

    • kaggle.com
    zip
    Updated Jun 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachel Czmyr (2022). BellaBeat [Dataset]. https://www.kaggle.com/datasets/rachelczmyr/bellabeat
    Explore at:
    zip(28175 bytes)Available download formats
    Dataset updated
    Jun 27, 2022
    Authors
    Rachel Czmyr
    License

    http://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html

    Description

    This project provides insight into the data analytics process followed for the Google Data Analytics Case Study – BellaBeat and covers the following deliverables:

    1. Business Task
    2. Data Sources Used
    3. Change Data- cleaning or manipulation of data
    4. Summary of Analysis
    5. Supporting Visualizations and Key Findings
    6. Top High-Level Content Recommendations

    It covers the 6 phases of data analysis: Ask, Prepare Process, Analyze, Share, Act

    To complete this case study, a variety of data analysis tools we used, including: Google Sheets, Excel, SQL, RStudio, Kaggle

  12. Protecting an island nation from extreme pandemic threats: Proof-of-concept...

    • plos.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Boyd; Michael G. Baker; Osman D. Mansoor; Giorgi Kvizhinadze; Nick Wilson (2023). Protecting an island nation from extreme pandemic threats: Proof-of-concept around border closure as an intervention [Dataset]. http://doi.org/10.1371/journal.pone.0178732
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Matt Boyd; Michael G. Baker; Osman D. Mansoor; Giorgi Kvizhinadze; Nick Wilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundCountries are well advised to prepare for future pandemic risks (e.g., pandemic influenza, novel emerging agents or synthetic bioweapons). These preparations do not typically include planning for complete border closure. Even though border closure may not be instituted in time, and can fail, there might still plausible chances of success for well organized island nations.ObjectiveTo estimate costs and benefits of complete border closure in response to new pandemic threats, at an initial proof-of-concept level. New Zealand was used as a case-study for an island country.MethodsAn Excel spreadsheet model was developed to estimate costs and benefits. Case-study specific epidemiological data was sourced from past influenza pandemics. Country-specific healthcare cost data, valuation of life, and lost tourism revenue were imputed (with lost trade also in scenario analyses).ResultsFor a new pandemic equivalent to the 1918 influenza pandemic (albeit with half the mortality rate, “Scenario A”), it was estimated that successful border closure for 26 weeks provided a net societal benefit (e.g., of NZ$11.0 billion, USD$7.3 billion). Even in the face of a complete end to trade, a net benefit was estimated for scenarios where the mortality rate was high (e.g., at 10 times the mortality impact of “Scenario A”, or 2.75% of the country’s population dying) giving a net benefit of NZ$54 billion (USD$36 billion). But for some other pandemic scenarios where trade ceased, border closure resulted in a net negative societal value (e.g., for “Scenario A” times three for 26 weeks of border closure–but not for only 12 weeks of closure when it would still be beneficial).ConclusionsThis “proof-of-concept” work indicates that more detailed cost-benefit analysis of border closure in very severe pandemic situations for some island nations is probably warranted, as this course of action might sometimes be worthwhile from a societal perspective.

  13. Data from: Software Evolution and Quality Data from Controlled, Multiple,...

    • zenodo.org
    • data.niaid.nih.gov
    bin, xls
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yamashita Aiko; Guéhéneuc Yann-Gaël; Khomh Foutse; Abtahizadeh Amirhossein; Yamashita Aiko; Guéhéneuc Yann-Gaël; Khomh Foutse; Abtahizadeh Amirhossein (2020). Software Evolution and Quality Data from Controlled, Multiple, Industrial Case Studies [Dataset]. http://doi.org/10.5281/zenodo.293719
    Explore at:
    bin, xlsAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yamashita Aiko; Guéhéneuc Yann-Gaël; Khomh Foutse; Abtahizadeh Amirhossein; Yamashita Aiko; Guéhéneuc Yann-Gaël; Khomh Foutse; Abtahizadeh Amirhossein
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This data was obtained from a controlled, multiple case study involving six professional developers and four real-life, industrial systems. The study was designed to control for the moderator factors: programmer skill, maintenance task and learning effect. The primary data set contains multiple sets of defects, in the form of reports (excel files) extracted from six issue tracking systems. The secondary data consists of a series of attributes extracted from the software systems (i.e., code smells) and their evolution (i.e., code churn), and a log specifying the dates on which developers worked on each of the systems/tasks, in the form of excel files. Details on the controlled, multiple case study can be found in the doctoral dissertation by Yamashita titled: "Assessing the Capability of Code Smells to Support Software Maintainability Assessments: Empirical Inquiry and Methodological Approach" (online) Available at: https://www.duo.uio.no/handle/10852/34525

  14. Interviews Transcripts

    • figshare.com
    xlsx
    Updated Mar 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabio Pinto (2023). Interviews Transcripts [Dataset]. http://doi.org/10.6084/m9.figshare.22269073.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 14, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Fabio Pinto
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This excel spreadsheet contains the full answers given and later corrected by respondents to the original questionnaire applied in the case study of the work: "Design Thinking for Requirements Engineering: Problems and Opportunities on Non-functional Requirements".

  15. ICSE 2025 - Artifact

    • figshare.com
    pdf
    Updated Jan 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FARIDAH AKINOTCHO (2025). ICSE 2025 - Artifact [Dataset]. http://doi.org/10.6084/m9.figshare.28194605.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    FARIDAH AKINOTCHO
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Mobile Application Coverage: The 30% Curse and Ways Forward## Purpose In this artifact, we provide the information about our benchmarks used for manual and tool exploration. We include coverage results achieved by tools and human analysts as well as plots of the coverage progression over time for analysts. We further provide manual analysis results for our case study, more specifically extracted reasons for unreachability for the case study apps and extracted code-level properties, which constitute a ground truth for future work in coverage explainability. Finally, we identify a list of beyond-GUI exploration tools and categorize them for future work to take inspiration from. We are claiming available and reusable badges; the artifact is fully aligned with the results described in our paper and comprehensively documented.## ProvenanceThe paper preprint is available here: https://people.ece.ubc.ca/mjulia/publications/Mobile_Application_Coverage_ICSE2025.pdf## Data The artifact submission is organized into five parts:- 'BenchInfo' excel sheet describing our experiment dataset- 'Coverage' folder containing coverage results for tools and analysts (RQ1) - 'Reasons' excel sheet describing our manually extracted reasons for unreachability (RQ2)- 'ActivationProperties' excel sheet describing our manually extracted code properties of unreached activities (RQ3)- 'ActivationProperties-Graph' pdf which presents combinations of the extracted code properties in a graph format.- 'BeyondGUI' folder containing information about identified techniques which go beyond GUI exploration.The artifact requires about 15MB of storage.### Dataset: 'BenchInfo.xlsx'This file list the full application dataset used for experiments into three tabs: 'BenchNotGP' (apps from AndroTest dataset which are not on Google Play), 'BenchGP' (apps from AndroTest which are also on Google Play) and 'TopGP' (top ranked free apps from Google Play). Each tab contains the following information:- Application Name- Package Name- Version Used (Latest)- Original Version- # Activities- Minimum SDK- Target SDK- # Permissions (in Manifest)- List of Permissions (in Manifest)- # Features (in Manifest)- List of Features (in Manifest)The 'TopGP' sheet also includes Google-Play-specific information, namely:- Category (one of 32 app categories)- Downloads- Popularity RankThe 'BenchGP' and 'BenchNotGP' sheets also include the original version (included in the AndroTest benchmark) and the source (one of F-Droid, Github or Google Code Archives).### RQ1: 'Coverage'The 'Coverage' folder includes coverage results for tools and analysts, and is structured as follows:- 'CoverageResults.xlsx": An excel sheet containing the coverage results achieved by each human analysts and tool. - The first tab described the results over all apps for analysts combined, tools combined, and analysts + tools, which map to Table II in the paper. - Each of the following 42 tab, one per app in TopGP, marks the activities reached by Analyst 1, Analyst 2, Tool 1 (ape) and Tool 2 (fastbot), with an 'x' in the corresponding column to indicate that the activity was reached by the given agent.- 'Plots': A folder containing plots of the progressive coverage over time of analysts, split into one folder for 'Analyst1' and one for 'Analyst2'. - Each of the analysts' folder includes a subfolder per benchmark ('BenchNotGP', 'BenchGP' and 'TopGP'), containing as many png files as applications in the benchmark (respectively 47, 14 and 42 image files) named 'ANALYST_[X]_[APP_PACKAGE_NAME]'.png.### RQ2: 'Reasons.xslx'This file contains the extracted reasons for unreachability for the 11 apps manually analyzed. - The 'Summary' tab provides an overview of unreached activities per reasons over all apps and per app, which corresponds to Table III in the paper. - The following 11 tabs, each corresponding to and named after a single application, describe the reasons associated with each activity of that application. Each column corresponds to a single reason and 'x' indicates that the activity is unreached due to the reason in that column. The top row sums up the total number of activities unreached due to a given reason in each column.- The activities at the bottom which are greyed out correspond to activities that were reached during exploration, and are thus excluded from the reason extraction.### RQ3: 'ActivationProperties.xslx'This file contains the full list of activation properties extracted for each of the 185 activities analyzed for RQ2.The first half of the columns (columns C-M) correspond to the reasons (excluding Transitive, Inconclusive and No Caller) and the second half (columns N-AD) correspond to properties described in Figure 5 in the paper, namely:- Exported- Activation Location: - Code: GUI/lifecycle, Other Android or App-specific - Manifest- Activation Guards: - Enforcement: In Code or In Resources - Restriction: Mandatory or Discretionary- Data: - Type: Parameters, Execution Dependencies - Format: Primitive, Strings, ObjectsThe rows are grouped by applications, and each row correspond to an activity of that application. 'x' in a given column indicates the presence of the property in that column within the analyzed path to the activity. The third and fourth rows sums up the numbers and percentages for each property, as reported in Figure 5.### RQ3: 'ActivationProperties-Graph.pdf'This file shows combinations of the individual properties listed in 'ActivationProperties.xlsx' in a graph format, extending the combinations described in Table IV with data (types and format) and reasons for unreachability.### BeyondGUIThis folder includes:- 'ToolInfo.xlsx': an excel sheet listing the identified 22 beyond-GUI papers, the date of publication, availability, invasiveness (Source code, Bytecode, framework, OS) and their targeting strategy (None, Manual or Automated).- ToolClassification.pdf': a pdf file describing our paper selection methodology as well as a classication of the techniques in terms of Invocation Strategy, Navigation Strategy, Value Generation Strategy, and Value Generation Types. We fully introduced these categories in the pdf file.## Requirements & technology skills assumed by the reviewer evaluating the artifactThe artifact entirely consists of Excel sheets which can be opened with common Excel visualization software, i.e., Microsoft Excel, coverage plots as PNG files and PDF files. It requires about 15MB of storage in total.No other specific technology skills are required of the reviewer evaluating the artifact.

  16. Google Data Analytics Capstone Project

    • kaggle.com
    Updated Oct 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Rookie (2022). Google Data Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/rookieaj1234/google-data-analytics-capstone-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data Rookie
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Project Name: Divvy Bikeshare Trip Data_Year2020 Date Range: April 2020 to December 2020. Analyst: Ajith Software: R Program, Microsoft Excel IDE: RStudio

    The following are the basic system requirements, necessary for the project: Processor: Intel i3 or AMD Ryzen 3 and higher Internal RAM: 8 GB or higher Operating System: Windows 7 or above, MacOS

    **Data Usage License: https://ride.divvybikes.com/data-license-agreement ** Introduction:

    In this case, study we aim to utilize different data analysis techniques and tools, to understand the rental patterns of the divvy bike sharing company and understand the key business improvement suggestions. This case study is a mandatory project to be submitted to achieve the Google Data Analytics Certification. The data utilized in this case study was licensed based on the provided data usage license. The trips between April 2020 to December 2020 are used to analyse the data.

    Scenario: Marketing team needs to design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ.

    Objective: The main objective of this case study, is to understand the customer usage patterns and the breakdown of customers, based on their subscription status and the average durations of the rental bike usage.

    Introduction to Data: The Data provided for this project, is adhered to the data usage license, laid down by the source company. The source data was provided in the CSV files and are month and quarter breakdowns. A total of 13 columns of data was provided in each csv file.

    The following are the columns, which were initially observed across the datasets.

    Ride_id Ride_type Start_station_name Start_station_id End_station_name End_station_id Usertype Start_time End_time Start_lat Start_lng End_lat End_lng

    Documentation, Cleaning and Preparing Data for Analysis: The total size of the datasets, for the year 2020, is approximately 450 MB, which is tiring job, when you have to upload them to the SQL database and visualize using the BI tools. I wanted to improve my skills into R environment and this is the best opportunity and optimal to use R for the data analysis.

    For more insights, installation procedures for R and RStudio, please refer to the following URL, for additional information.

    R Projects Document: https://www.r-project.org/other-docs.html RStudio Download: https://www.rstudio.com/products/rstudio/ Installation Guide: https://www.youtube.com/watch?v=TFGYlKvQEQ4

  17. r

    BASE_PACE_MODU_cases_description.xlsx

    • resodate.org
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camille Vandervaeren (2021). BASE_PACE_MODU_cases_description.xlsx [Dataset]. http://doi.org/10.6084/M9.FIGSHARE.14740734.V1
    Explore at:
    Dataset updated
    Jan 1, 2021
    Dataset provided by
    figshare
    Authors
    Camille Vandervaeren
    Description

    Description of building parts in three case study buildings to study the effects interdependent parts have on life cycle material flows and associated environmental impact of the building. This data is related to fictitious buildings (pavilions). This excel file contains three datasets for three fictitious cases used in the analysis of interdependencies:- BASE: the base case - PACE: a pace-layered variation of the BASE case - MODU: a modular variation of the BASE case The three cases can be visualized in the associated paper (DOI will be provided in References when known).

  18. Safety of Perception System for Automated Driving: A Case Study on Apollo

    • zenodo.org
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2024). Safety of Perception System for Automated Driving: A Case Study on Apollo [Dataset]. http://doi.org/10.5281/zenodo.6367705
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This replication package consists of the detailed results of the safety assessment process of Apollo 7.0’s perception system for its use on a 3.4-kilometer segment of the Dutch highway, A270.

    ‘Operational design domain description’ consists of a detailed description of the operational area (operational design domain).

    ‘Hazard Analysis and Risk Assessment’ is a Microsoft Excel workbook of 6 sheets comprising of all intermediate results from the first two steps of safety requirement elicitation (hazard analysis, risk assessment), along with the final result (safety goals and their risk levels).

    ‘Safety Requirements’ is a Microsoft Excel workbook of 2 sheets comprising the final result of the safety requirement elicitation process, i.e., safety requirements (1) for traditional software (2) specific to ML-based systems.

    ‘Design assessment’ is a Microsoft Excel workbook of 2 sheets comprising the design assessment results. Specifically, the sheets consist of (1) the safety requirements and applicable design choices for each requirement; (2) where did we assess each requirement; (3) the final verdict for assessment of each requirement; (4) the reason for the verdict and the design decisions found in Apollo’s perception system related artifacts.

  19. f

    Data Excel sheet for Deciphering the significance of neutrophil to...

    • datasetcatalog.nlm.nih.gov
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rai, Sharada; Nayak, Rakshatha (2024). Data Excel sheet for Deciphering the significance of neutrophil to lymphocyte and monocyte to lymphocyte ratios in tuberculosis: A case-control study from southern India. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001397190
    Explore at:
    Dataset updated
    Jun 11, 2024
    Authors
    Rai, Sharada; Nayak, Rakshatha
    Description

    This is the data collection sheet for the study Deciphering the significance of neutrophil to lymphocyte and monocyte to lymphocyte ratios in tuberculosis: A case-control study from southern India

  20. f

    Repeated Measures data files

    • auckland.figshare.com
    zip
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gavin T. L. Brown (2020). Repeated Measures data files [Dataset]. http://doi.org/10.17608/k6.auckland.13211120.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 9, 2020
    Dataset provided by
    The University of Auckland
    Authors
    Gavin T. L. Brown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This zip file contains data files for 3 activities described in the accompanying PPT slides 1. an excel spreadsheet for analysing gain scores in a 2 group, 2 times data array. this activity requires access to –https://campbellcollaboration.org/research-resources/effect-size-calculator.html to calculate effect size.2. an AMOS path model and SPSS data set for an autoregressive, bivariate path model with cross-lagging. This activity is related to the following article: Brown, G. T. L., & Marshall, J. C. (2012). The impact of training students how to write introductions for academic essays: An exploratory, longitudinal study. Assessment & Evaluation in Higher Education, 37(6), 653-670. doi:10.1080/02602938.2011.5632773. an AMOS latent curve model and SPSS data set for a 3-time latent factor model with an interaction mixed model that uses GPA as a predictor of the LCM start and slope or change factors. This activity makes use of data reported previously and a published data analysis case: Peterson, E. R., Brown, G. T. L., & Jun, M. C. (2015). Achievement emotions in higher education: A diary study exploring emotions across an assessment event. Contemporary Educational Psychology, 42, 82-96. doi:10.1016/j.cedpsych.2015.05.002andBrown, G. T. L., & Peterson, E. R. (2018). Evaluating repeated diary study responses: Latent curve modeling. In SAGE Research Methods Cases Part 2. Retrieved from http://methods.sagepub.com/case/evaluating-repeated-diary-study-responses-latent-curve-modeling doi:10.4135/9781526431592

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Britta Smith (2022). Bellabeat Case Study Supplement [Dataset]. https://www.kaggle.com/datasets/brittasmith/bellabeat-casestudy-sql-tableau-excel
Organization logo

Bellabeat Case Study Supplement

See: https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau

Explore at:
zip(65670 bytes)Available download formats
Dataset updated
Oct 28, 2022
Authors
Britta Smith
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Raw data, clean data, and SQL query output tables as spreadsheets to support Tableau story and github repository available at https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau

Search
Clear search
Close search
Google apps
Main menu