100+ datasets found
  1. Google Data Analytics Capstone Project

    • kaggle.com
    zip
    Updated Nov 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NANCY CHAUHAN (2021). Google Data Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/nancychauhan199/google-case-study-pdf
    Explore at:
    zip(284279 bytes)Available download formats
    Dataset updated
    Nov 13, 2021
    Authors
    NANCY CHAUHAN
    Description

    Case Study: How Does a Bike-Share Navigate Speedy Success?¶

    Introduction

    Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the details of this case study anytime. Then, when you begin your job hunt, your case study will be a tangible way to demonstrate your knowledge and skills to potential employers.

    Scenario

    You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations. Characters and teams ● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day. ● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels. ● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them. ● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

    About the company

    In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends

    Three questions will guide the future marketing program:

    How do annual members and casual riders use Cyclistic bikes differently? Why would casual riders buy Cyclistic annual memberships? How can Cyclistic use digital media to influence casual riders to become members? Moreno has assigned you the first question to answer: How do annual members and casual rid...

  2. cases study1 example for google data analytics

    • kaggle.com
    zip
    Updated Apr 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mohammed hatem (2023). cases study1 example for google data analytics [Dataset]. https://www.kaggle.com/datasets/mohammedhatem/cases-study1-example-for-google-data-analytics
    Explore at:
    zip(25278847 bytes)Available download formats
    Dataset updated
    Apr 22, 2023
    Authors
    mohammed hatem
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    In the way of my journey to earn the google data analytics certificate I will practice real world example by following the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Picking the Bellabeat example.

  3. Google Certificate Case study 2

    • kaggle.com
    zip
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philie Gomez (2023). Google Certificate Case study 2 [Dataset]. https://www.kaggle.com/datasets/philiegomez/google-certificate-case-study-2
    Explore at:
    zip(615485 bytes)Available download formats
    Dataset updated
    Jun 13, 2023
    Authors
    Philie Gomez
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This was a exiting case study for the Google Data Analytics Certification 2023. I choose to do the Case Study 2, the goal was as a business analyst for a small health tracker company how can we use the data from Fitbit users to inform a decision for growth when comparing it to one of Bellabeat's products. I included apple watch users since the data did appear limited in the sample size being 33 participants and with the apple watch users the sample size went up to 59 participants.

    I have included my notes from data cleaning process and a power point on my findings and recommendation.

    Datasets were not my own and belong to Datasets - ‘FitBit Fitness Tracker Data’ by Mobius, 2022, https://www.kaggle.com/datasets/arashnic/fitbit License: CC0: Public Domain, sources: https://zenodo.org/record/53894#.X9oeh3Uzaao - ‘Apple Watch and Fitbit data’ by Alejandro Espinosa, 2022, https://www.kaggle.com/datasets/aleespinosa/apple-watch-and-fitbit-data, License: CC0: Public Domain, sources: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZS2Z2J

  4. Salaries case study

    • kaggle.com
    zip
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shobhit Chauhan (2024). Salaries case study [Dataset]. https://www.kaggle.com/datasets/satyam0123/salaries-case-study
    Explore at:
    zip(13105509 bytes)Available download formats
    Dataset updated
    Oct 2, 2024
    Authors
    Shobhit Chauhan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    To analyze the salaries of company employees using Pandas, NumPy, and other tools, you can structure the analysis process into several steps:

    Case Study: Employee Salary Analysis In this case study, we aim to analyze the salaries of employees across different departments and levels within a company. Our goal is to uncover key patterns, identify outliers, and provide insights that can support decisions related to compensation and workforce management.

    Step 1: Data Collection and Preparation Data Sources: The dataset typically includes employee ID, name, department, position, years of experience, salary, and additional compensation (bonuses, stock options, etc.). Data Cleaning: We use Pandas to handle missing or incomplete data, remove duplicates, and standardize formats. Example: df.dropna() to handle missing salary information, and df.drop_duplicates() to eliminate duplicate entries. Step 2: Data Exploration and Descriptive Statistics Exploratory Data Analysis (EDA): Using Pandas to calculate basic statistics such as mean, median, mode, and standard deviation for employee salaries. Example: df['salary'].describe() provides an overview of the distribution of salaries. Data Visualization: Leveraging tools like Matplotlib or Seaborn for visualizing salary distributions, box plots to detect outliers, and bar charts for department-wise salary breakdowns. Example: sns.boxplot(x='department', y='salary', data=df) provides a visual representation of salary variations by department. Step 3: Analysis Using NumPy Calculating Salary Ranges: NumPy can be used to calculate the range, variance, and percentiles of salary data to identify the spread and skewness of the salary distribution. Example: np.percentile(df['salary'], [25, 50, 75]) helps identify salary quartiles. Correlation Analysis: Identify the relationship between variables such as experience and salary using NumPy to compute correlation coefficients. Example: np.corrcoef(df['years_of_experience'], df['salary']) reveals if experience is a significant factor in salary determination. Step 4: Grouping and Aggregation Salary by Department and Position: Using Pandas' groupby function, we can summarize salary information for different departments and job titles to identify trends or inequalities. Example: df.groupby('department')['salary'].mean() calculates the average salary per department. Step 5: Salary Forecasting (Optional) Predictive Analysis: Using tools such as Scikit-learn, we could build a regression model to predict future salary increases based on factors like experience, education level, and performance ratings. Step 6: Insights and Recommendations Outlier Identification: Detect any employees earning significantly more or less than the average, which could signal inequities or high performers. Salary Discrepancies: Highlight any salary discrepancies between departments or gender that may require further investigation. Compensation Planning: Based on the analysis, suggest potential changes to the salary structure or bonus allocations to ensure fair compensation across the organization. Tools Used: Pandas: For data manipulation, grouping, and descriptive analysis. NumPy: For numerical operations such as percentiles and correlations. Matplotlib/Seaborn: For data visualization to highlight key patterns and trends. Scikit-learn (Optional): For building predictive models if salary forecasting is included in the analysis. This approach ensures a comprehensive analysis of employee salaries, providing actionable insights for human resource planning and compensation strategy.

  5. Data from: Analyzing small data sets using Bayesian estimation: the case of...

    • tandf.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rens van de Schoot; Joris J. Broere; Koen H. Perryck; Mariëlle Zondervan-Zwijnenburg; Nancy E. van Loey (2023). Analyzing small data sets using Bayesian estimation: the case of posttraumatic stress symptoms following mechanical ventilation in burn survivors [Dataset]. http://doi.org/10.6084/m9.figshare.21829502.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Rens van de Schoot; Joris J. Broere; Koen H. Perryck; Mariëlle Zondervan-Zwijnenburg; Nancy E. van Loey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The analysis of small data sets in longitudinal studies can lead to power issues and often suffers from biased parameter values. These issues can be solved by using Bayesian estimation in conjunction with informative prior distributions. By means of a simulation study and an empirical example concerning posttraumatic stress symptoms (PTSS) following mechanical ventilation in burn survivors, we demonstrate the advantages and potential pitfalls of using Bayesian estimation. First, we show how to specify prior distributions and by means of a sensitivity analysis we demonstrate how to check the exact influence of the prior (mis-) specification. Thereafter, we show by means of a simulation the situations in which the Bayesian approach outperforms the default, maximum likelihood and approach. Finally, we re-analyze empirical data on burn survivors which provided preliminary evidence of an aversive influence of a period of mechanical ventilation on the course of PTSS following burns. Not suprisingly, maximum likelihood estimation showed insufficient coverage as well as power with very small samples. Only when Bayesian analysis, in conjunction with informative priors, was used power increased to acceptable levels. As expected, we showed that the smaller the sample size the more the results rely on the prior specification. We show that two issues often encountered during analysis of small samples, power and biased parameters, can be solved by including prior information into Bayesian analysis. We argue that the use of informative priors should always be reported together with a sensitivity analysis.

  6. r

    MCCN Case Study 4 - Validating gridded data products

    • researchdata.edu.au
    • adelaide.figshare.com
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja (2025). MCCN Case Study 4 - Validating gridded data products [Dataset]. http://doi.org/10.25909/29176553.V1
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.

    The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 4.ipynb)

    Research Activity Identifier (RAiD)

    RAiD: https://doi.org/10.26292/8679d473

    Case Studies

    This repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.

    Case Study 4 - Validating gridded data products

    Description

    Compare Bureau of Meteorology gridded daily maximum and minimum temperature data with data from weather stations across Western Australia.

    This is an example of comparing high-quality ground-based data from multiple sites with a data product from satellite imagery or data modelling, so that I can assess its precision and accuracy for estimating the same variables at other sites.

    Data Sources

    The case study uses national weather data products from the Bureau of Meteorology for daily mean maximum/minimum temperature, accessible from http://www.bom.gov.au/jsp/awap/temp/index.jsp. Seven daily maximum and minimum temperature grids were downloaded for the dates 7 to 13 April 2025 inclusive. These data can be accessed in the source_data folder in the downloaded ASCII grid format (*.grid). These data will be loaded into the data cube as WGS84 Geotiff files. To avoid extra dependencies in this notebook, the data have already been converted using QGIS Desktop and are also included in the source_data folder (*.tiff).

    Comparison data for maximum and minimum air temperature were downloaded for all public weather stations in Western Australia from https://weather.agric.wa.gov.au/ for the 10 day period 4 to 13 April 2025. These are included in source_data as CSV files. These downloads do not include the coordinates for the weather stations. These were downloaded via the https://api.agric.wa.gov.au/v2/weather/openapi/#/Stations/getStations API method and are included in source_data as DPIRD_weather_stations.json.

    Dependencies

    • This notebook requires Python 3.10 or higher
    • Install relevant Python libraries with: pip install mccn-engine rocrate
    • Installing mccn-engine will install other dependencies

    Overview

    1. Convert weather station data to point measurements (longitude, latitude, date, temperature)
    2. Prepare STAC metadata records for each data source (separate records for each daily minimum and maximum layer from BOM, one for all weather station minima, and one for all weather station maxima)
    3. Load data cube
    4. Visualise cube
    5. Calculate differences between weather station values and BOM data for each station and date
    6. Identify sites with extreme differences (errors) for minimum and maximum temperature
    7. Identify sites with low differences for minimum and maximum temperature
    8. Cleanup and write results to RO-Crate

    Notes

    • Weather stations with high differences/errors are likely to have configuration of positioning issus and should not be treated as reliable.
    • Weather stations with low errors are suitable for use in local analysis.
    • The generally low difference between the measured values and the BOM products indicates the level of confidence that should be applied to use of these products for analyses where local measurements are not available.
    • In reality, at least some of these sites will have contributed to the BOM products, so the comparands are not truly independent.


  7. Z

    Documentary sources of case studies on the issues a data protection officer...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Apr 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciclosi, Francesco; Massacci, Fabio (2023). Documentary sources of case studies on the issues a data protection officer faces on a daily basis [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7879103
    Explore at:
    Dataset updated
    Apr 30, 2023
    Dataset provided by
    University of Trento, Vrije Universiteit Amsterdam
    University of Trento
    Authors
    Ciclosi, Francesco; Massacci, Fabio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the text of the documents that are sources of evidence used in [1] and [2] to distill our reference scenarios according to the methodology suggested by Yin in [3].

    The dataset is composed of 95 unique document texts spanning the period 2005-2022. This dataset makes available a corpus of documentary sources useful for outlining case studies related to scenarios in which the DPO finds himself operating in the performance of his daily activities.

    The language used in the corpus is mainly Italian, but some documents are in English and French. For the reader's benefit, we provide an English translation of the title of each document.

    The documentary sources are of many types (for example, court decisions, supervisory authorities' decisions, job advertisements, and newspaper articles), provided by different bodies (such as supervisor authorities, data controllers, European Union institutions, private companies, courts, public authorities, research organizations, newspapers, and public administrations), and redacted from distinct professional roles (for example, data protection officers, general managers, university rectors, collegiate bodies, judges, and journalists).

    The documentary sources were collected from 31 different bodies. Most of the documents in the corpus (a total of 83 documents) have been transformed into Rich Text Format (RTF), while the other documents (a total of 12) are in PDF format. All the documents have been manually read and verified. The dataset is helpful as a starting point for a case studies analysis on the daily issues a data protection officer face. Details on the methodology can be found in the accompanying papers.

    The available files are as follows:

    documents-texts.zip --> contain a directory of .rtf files (in some cases .pdf files) with the text of documents used as sources for the case studies. Each file has been renamed with its SHA1 hash so that it can be easily recognized.

    documents-metadata.csv --> Contains a CSV file with the metadata for each document used as a source for the case studies.

    This dataset is the original one used in the publication [1] and the preprint containing the additional material [2].

    [1] F. Ciclosi and F. Massacci, "The Data Protection Officer: A Ubiquitous Role That No One Really Knows" in IEEE Security & Privacy, vol. 21, no. 01, pp. 66-77, 2023, doi: 10.1109/MSEC.2022.3222115, url: https://doi.ieeecomputersociety.org/10.1109/MSEC.2022.3222115.

    [2] F. Ciclosi and F. Massacci, "The Data Protection Officer, an ubiquitous role nobody really knows." arXiv preprint arXiv:2212.07712, 2022.

    [3] R. K. Yin, Case study research and applications. Sage, 2018.

  8. Data from: Anomalous values and missing data in clinical and experimental...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Hélio Amante Miot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

  9. r

    MCCN Case Study 6 - Environmental Correlates for Productivity

    • researchdata.edu.au
    Updated Nov 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja (2025). MCCN Case Study 6 - Environmental Correlates for Productivity [Dataset]. http://doi.org/10.25909/29176682.V1
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.

    The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 6.ipynb)

    Research Activity Identifier (RAiD)

    RAiD: https://doi.org/10.26292/8679d473

    Case Studies

    This repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.

    Case Study 6 - Environmental Correlates for Productivity

    Description

    Analyse relationship between different environmental drivers and plant yield. This study demonstrates: 1) Loading heterogeneous data sources into a cube, and 2) Analysis and visualisation of drivers. This study combines a suite of spatial variables at different scales across multiple sites to analyse the factors correlated with a variable of interest.

    Data Sources

    The dataset includes the Gilbert site in Queensland which has multiple standard sized plots for three years. We are using data from 2022. The source files are part pf the larger collection - Chapman, Scott and Smith, Daniel (2023). INVITA Core site UAV dataset. The University of Queensland. Data Collection. https://doi.org/10.48610/951f13c

    1. Boundary file - This is a shapefile defining the boundaries of all field plots at the Gilbert site. Each polygon represents a single plot and is associated with a unique Plot ID (e.g., 03_03_1). These plot IDs are essential for joining and aligning data across the orthomosaics and plot-level measurements.
    1. Orthomosaics - The site was imaged by UAV flights multiple times throughout the 2022 growing season, spanning from June to October. Each flight produced an orthorectified mosaic image using RGB and Multispectral (MS) sensors.
    1. Plot level measurements - Multispectral Traits: Calculated from MS sensor imagery and include indices NDVI, NDRE, SAVI and Biomass Cuts: Field-measured biomass sampled during different growth stages (used as a proxy for yield).


  10. MISR Derived Case Study Data for Kilauea Volcanic Eruptions Including...

    • data.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). MISR Derived Case Study Data for Kilauea Volcanic Eruptions Including Geometric Plume Height and Qualitative Radiometric Particle Property Information - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/misr-derived-case-study-data-for-kilauea-volcanic-eruptions-including-geometric-plume-heig-6a7ee
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Kīlauea
    Description

    The KILVOLC_FlowerKahn2021_1 dataset is the MISR Derived Case Study Data for Kilauea Volcanic Eruptions Including Geometric Plume Height and Qualitative Radiometric Particle Property Information version 1 dataset. It comprises MISR-derived output from a comprehensive analysis of Kilauea volcanic eruptions (2000-2018). Data collection for this dataset is complete. The data presented here are analyzed and discussed in the following paper: Flower, V.J.B., and R.A. Kahn, 2021. Twenty years of NASA-EOS multi-sensor satellite observations at Kīlauea volcano (2000-2019). J. Volc. Geo. Res. (in press).The data is subdivided by date and MISR orbit number. Within each case folder, there are up to 11 files relating to an individual MISR overpass. Files include plume height records (from both the red and blue spectral bands) derived from the MISR INteractive eXplorer (MINX) program, displayed in: map view, downwind profile plot (along with the associated wind vectors retrieved at plume elevation), a histogram of retrieved plume heights and a text file containing the digital plume height values. An additional JPG is included delineating the plume analysis region, start point for assessing downwind distance, and input wind direction used to initialize the MINX retrieval. A final two files are generated from the MISR Research Aerosol (RA) retrieval algorithm (Limbacher, J.A., and R.A. Kahn, 2014. MISR Research-Aerosol-Algorithm: Refinements For Dark Water Retrievals. Atm. Meas. Tech. 7, 1-19, doi:10.5194/amt-7-1-2014). These files include the RA model output in HDF5, and an associated JPG of key derived variables (e.g. Aerosol Optical Depth, Angstrom Exponent, Single Scattering Albedo, Fraction of Non-Spherical components, model uncertainty classifications and example camera views). File numbers per folder vary depending on the retrieval conditions of specific observations. RA plume retrievals are limited when cloud cover was widespread or the solar radiance was insufficient to run the RA. In these cases the RA files are not included in the individual folders. In cases where activity was observed from multiple volcanic zones in a single overpass, individual folders containing data relating to a single region, are included, and defined by a qualifier (e.g. '_1').

  11. Interiorization Case Study, 2019 - Brazil

    • microdata.unhcr.org
    Updated Jan 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    REACH (2022). Interiorization Case Study, 2019 - Brazil [Dataset]. https://microdata.unhcr.org/index.php/catalog/612
    Explore at:
    Dataset updated
    Jan 31, 2022
    Dataset provided by
    United Nations High Commissioner for Refugeeshttp://www.unhcr.org/
    REACH
    Time period covered
    2019
    Area covered
    Brazil
    Description

    Abstract

    The Federal Government Interiorization strategy implemented by Operation Welcome voluntarily relocates Venezuelan refugees and migrants from the states of Roraima and Amazonas to other cities in the country. This study had the purpose to analysise a cohort of households before and after interiorization. 366 households were interviewed in Boa Vista before departure. 148 follow up telephone interviews took place 6-8 weeks following their departure. 145 households that relocated more than 4 months prior ro the research action were interviewed as control group.

    Geographic coverage

    National

    Analysis unit

    Household

    Universe

    Households that relocated internally from one city to another.

    Kind of data

    Sample survey data [ssd]

    Mode of data collection

    Other

  12. Data from: MISR Derived Case Study Data for Kilauea Volcanic Eruptions...

    • catalog.data.gov
    • cmr.earthdata.nasa.gov
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA/LARC/SD/ASDC (2025). MISR Derived Case Study Data for Kilauea Volcanic Eruptions Including Geometric Plume Height and Qualitative Radiometric Particle Property Information [Dataset]. https://catalog.data.gov/dataset/misr-derived-case-study-data-for-kilauea-volcanic-eruptions-including-geometric-plume-heig
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Kīlauea
    Description

    The KILVOLC_FlowerKahn2021_1 dataset is the MISR Derived Case Study Data for Kilauea Volcanic Eruptions Including Geometric Plume Height and Qualitative Radiometric Particle Property Information version 1 dataset. It comprises MISR-derived output from a comprehensive analysis of Kilauea volcanic eruptions (2000-2018). Data collection for this dataset is complete. The data presented here are analyzed and discussed in the following paper: Flower, V.J.B., and R.A. Kahn, 2021. Twenty years of NASA-EOS multi-sensor satellite observations at Kīlauea volcano (2000-2019). J. Volc. Geo. Res. (in press).The data is subdivided by date and MISR orbit number. Within each case folder, there are up to 11 files relating to an individual MISR overpass. Files include plume height records (from both the red and blue spectral bands) derived from the MISR INteractive eXplorer (MINX) program, displayed in: map view, downwind profile plot (along with the associated wind vectors retrieved at plume elevation), a histogram of retrieved plume heights and a text file containing the digital plume height values. An additional JPG is included delineating the plume analysis region, start point for assessing downwind distance, and input wind direction used to initialize the MINX retrieval. A final two files are generated from the MISR Research Aerosol (RA) retrieval algorithm (Limbacher, J.A., and R.A. Kahn, 2014. MISR Research-Aerosol-Algorithm: Refinements For Dark Water Retrievals. Atm. Meas. Tech. 7, 1-19, doi:10.5194/amt-7-1-2014). These files include the RA model output in HDF5, and an associated JPG of key derived variables (e.g. Aerosol Optical Depth, Angstrom Exponent, Single Scattering Albedo, Fraction of Non-Spherical components, model uncertainty classifications and example camera views). File numbers per folder vary depending on the retrieval conditions of specific observations. RA plume retrievals are limited when cloud cover was widespread or the solar radiance was insufficient to run the RA. In these cases the RA files are not included in the individual folders. In cases where activity was observed from multiple volcanic zones in a single overpass, individual folders containing data relating to a single region, are included, and defined by a qualifier (e.g. '_1').

  13. c

    Data from: Institutionalization of Business Intelligence for Enhancing...

    • acquire.cqu.edu.au
    • researchdata.edu.au
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Shaheb Ali (2023). Institutionalization of Business Intelligence for Enhancing Organizational Agility in Developing Countries: An Example of Bangladesh. Dataset [Dataset]. http://doi.org/10.25946/21587304.v2
    Explore at:
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    CQUniversity
    Authors
    Md Shaheb Ali
    License

    https://rightsstatements.org/page/InC/1.0/?language=enhttps://rightsstatements.org/page/InC/1.0/?language=en

    Area covered
    Bangladesh
    Description

    This proposed research aims to explore how institutionalization of Business Intelligence (BI) can enhance organizational agility in developing countries. Business performance is increasingly affected by unanticipated emerging opportunities or threats from constant diverse changes occurring within the environment. Decision-making to cope with the changing environment becomes a challenge for taking opportunities or managing threats. Organizational agility is the ability of sensing and taking opportunities through responding to those changes with speed. As BI provides data-driven decision-making support, BI institutionalization is vital for enhancing organizational agility to make a decision for responding to the dynamic environment. However, there has been little prior research in this area focussed on developing countries. Therefore, this research addresses the research gap in how BI institutionalization in developing countries can enhance organizational agility. Bangladesh is used as an example of a developing country. A multiple case study approach was employed for collecting qualitative data using open-ended interviews. The studied data were analysed to generate new understanding of how BI institutionalization impacts organizational agility for decision-making in the context of developing countries.

  14. Data from: BikeShare Dataset

    • kaggle.com
    zip
    Updated Apr 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenniss Dillon (2022). BikeShare Dataset [Dataset]. https://www.kaggle.com/datasets/kennissdillon/bikeshare-dataset/data
    Explore at:
    zip(207350631 bytes)Available download formats
    Dataset updated
    Apr 18, 2022
    Authors
    Kenniss Dillon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This data set is made available through the Google Analytics Coursera course. This data set is a part of a case study example, meant to showcase skills learned throughout the course.

  15. m

    Data for: Inventory routing and dynamic redistribution of relief goods in...

    • data.mendeley.com
    Updated Mar 31, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbas Seifi (2020). Data for: Inventory routing and dynamic redistribution of relief goods in post-disaster operations [Dataset]. http://doi.org/10.17632/83pmt3dbwt.1
    Explore at:
    Dataset updated
    Mar 31, 2020
    Authors
    Abbas Seifi
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Parameters of problems: This file includes all parameters related to the sample problems and case study instances. Numerical Results: This file includes the results of parameter tuning of the proposed algorithm (SSA), solving sample problems and case study instances using CPLEX and SSA, and sensitivity analysis of the instances as well as the related graphs.

  16. Sample of codes and themes from interview data.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan N. O'Hara; Rodney Mugarura; Gerard P. Slobogean; Maryse Bouchard (2023). Sample of codes and themes from interview data. [Dataset]. http://doi.org/10.1371/journal.pone.0110940.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nathan N. O'Hara; Rodney Mugarura; Gerard P. Slobogean; Maryse Bouchard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample of codes and themes from interview data.

  17. SQL Analytics Case Study (Employees Database)

    • kaggle.com
    zip
    Updated Nov 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyank Barbhaya (2025). SQL Analytics Case Study (Employees Database) [Dataset]. https://www.kaggle.com/datasets/priyankbarbhaya/sql-analytics-case-study-employees-database
    Explore at:
    zip(7449546 bytes)Available download formats
    Dataset updated
    Nov 4, 2025
    Authors
    Priyank Barbhaya
    Description

    This dataset contains the complete MySQL Employees Database, a widely used sample dataset for learning SQL, data analysis, business intelligence, and database design. It includes employee information, salaries, job titles, departments, managers, and department history, making it ideal for real-world analytical practice.

    The dataset is structured into multiple tables that represent a real corporate environment with employee records spanning several decades. Users can practice SQL joins, window functions, aggregation, CTEs, subqueries, business KPIs, HR analytics, trend analysis, and more.

  18. r

    MCCN Case Study 3 - Select optimal survey locality

    • researchdata.edu.au
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja (2025). MCCN Case Study 3 - Select optimal survey locality [Dataset]. http://doi.org/10.25909/29176451.V1
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.

    The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 3.ipynb)

    Research Activity Identifier (RAiD)

    RAiD: https://doi.org/10.26292/8679d473

    Case Studies

    This repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.

    Case Study 3 - Select optimal survey locality

    Given a set of existing survey locations across a variable landscape, determine the optimal site to add to increase the range of surveyed environments. This study demonstrates: 1) Loading heterogeneous data sources into a cube, and 2) Analysis and visualisation using numpy and matplotlib.

    Data Sources

    The primary goal for this case study is to demonstrate being able to import a set of environmental values for different sites and then use these to identify a subset that maximises spread across the various environmental dimensions.

    This is a simple implementation that uses four environmental attributes imported for all Australia (or a subset like NSW) at a moderate grid scale:

    1. Digital soil maps for key soil properties over New South Wales, version 2.0 - SEED - see https://esoil.io/TERNLandscapes/Public/Pages/SLGA/ProductDetails-SoilAttributes.html
    2. ANUCLIM Annual Mean Rainfall raster layer - SEED - see https://datasets.seed.nsw.gov.au/dataset/anuclim-annual-mean-rainfall-raster-layer
    3. ANUCLIM Annual Mean Temperature raster layer - SEED - see https://datasets.seed.nsw.gov.au/dataset/anuclim-annual-mean-temperature-raster-layer

    Dependencies

    • This notebook requires Python 3.10 or higher
    • Install relevant Python libraries with: pip install mccn-engine rocrate
    • Installing mccn-engine will install other dependencies

    Overview

    1. Generate STAC metadata for layers from predefined configuratiion
    2. Load data cube and exclude nodata values
    3. Scale all variables to a 0.0-1.0 range
    4. Select four layers for comparison (soil organic carbon 0-30 cm, soil pH 0-30 cm, mean annual rainfall, mean annual temperature)
    5. Select 10 random points within NSW
    6. Generate 10 new layers representing standardised environmental distance between one of the selected points and all other points in NSW
    7. For every point in NSW, find the lowest environmental distance to any of the selected points
    8. Select the point in NSW that has the highest value for the lowest environmental distance to any selected point - this is the most different point
    9. Clean up and save results to RO-Crate


  19. r

    MCCN Case Study 2 - Spatial projection via modelled data

    • researchdata.edu.au
    • adelaide.figshare.com
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja (2025). MCCN Case Study 2 - Spatial projection via modelled data [Dataset]. http://doi.org/10.25909/29176364.V1
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.

    The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 2.ipynb)

    Research Activity Identifier (RAiD)

    RAiD: https://doi.org/10.26292/8679d473

    Case Studies

    This repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.

    Case Study 2 - Spatial projection via modelled data

    Description

    Estimate soil pH and electrical conductivity at 45 cm depth across a farm based on values collected from soil samples. This study demonstrates: 1) Description of spatial assets using STAC, 2) Loading heterogeneous data sources into a cube, 3) Spatial projection in xarray using different algorithms offered by the pykrige and rioxarray packages.

    Data sources

    • BradGinns_SOIL2004_SoilData.csv - Soil measurements from the University of Sydney Llara Campey farm site from 2004, corresponding to sites L1, L3 and L4 describing mid-depth, soil apparent electrical conductivity (ECa), GammaK, Clay, Silt, Sand, pH and soil electrical conductivity (EC)
    • Llara_Campey_field_boundaries_poly.shp - Field boundary shapes for the University of Sydney Llara Campey farm site

    Dependencies

    • This notebook requires Python 3.10 or higher
    • Install relevant Python libraries with: pip install mccn-engine rocrate rioxarray pykrige
    • Installing mccn-engine will install other dependencies

    Overview

    1. Select soil sample measurements for pH or EC at 45 cm depth
    2. Split sample measurements into 80% subset to model interpolated layers and 20% to test interpolated layers
    3. Generate STAC metadata for layers
    4. Load data cube
    5. Interpolate pH and EC across site using the 80% subset and three different 2D interpolation methods from rioxarray (nearest, linear and cubic) and one from pykrige (linear)
    6. Calculate the error between each layer of interpolated values and measured values for the 20% setaside for testing
    7. Compare the mean and standard deviation of the errors for each interpolation method
    8. Clean up and package results as RO-Crate

    Notes

    • The granularity of variability in soil data significantly compromises all methods
    • Depending on the 80/20 split, different methods may appear more reliable, but the pykrige linear method is most often best


  20. r

    Data from: Explaining the competitive advantage generated from Analytics...

    • resodate.org
    Updated Nov 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tino T. Herden (2020). Explaining the competitive advantage generated from Analytics with the knowledge-based view: the example of Logistics and Supply Chain Management [Dataset]. http://doi.org/10.14279/depositonce-10793
    Explore at:
    Dataset updated
    Nov 11, 2020
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Tino T. Herden
    Description

    The purpose of this paper is to provide a theory-based explanation for the generation of competitive advantage from Analytics and to examine this explanation with evidence from confirmatory case studies. A theoretical argumentation for achieving sustainable competitive advantage from knowledge unfolding in the knowledge-based view forms the foundation for this explanation. Literature about the process of Analytics initiatives, surrounding factors, and conditions, and benefits from Analytics are mapped onto the knowledge-based view to derive propositions. Eight confirmatory case studies of organizations mature in Analytics were collected, focused on Logistics and Supply Chain Management. A theoretical framework explaining the creation of competitive advantage from Analytics is derived and presented with an extensive description and rationale. This highlights various aspects outside of the analytical methods contributing to impactful and successful Analytics initiatives. Thereby, the relevance of a problem focus and iterative solving of the problem, especially with incorporation of user feedback, is justified and compared to other approaches. Regarding expertise, the advantage of cross-functional teams over data scientist centric initiatives is discussed, as well as modes and reasons of incorporating external expertise. Regarding the deployment of Analytics solutions, the importance of consumability, users assuming responsibility of incorporating solutions into their processes, and an innovation promoting culture (as opposed to a data-driven culture) are described and rationalized. Further, this study presents a practical manifestation of the knowledge-based view.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NANCY CHAUHAN (2021). Google Data Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/nancychauhan199/google-case-study-pdf
Organization logo

Google Data Analytics Capstone Project

Cyclistic Bike Share Analysis

Explore at:
zip(284279 bytes)Available download formats
Dataset updated
Nov 13, 2021
Authors
NANCY CHAUHAN
Description

Case Study: How Does a Bike-Share Navigate Speedy Success?¶

Introduction

Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the details of this case study anytime. Then, when you begin your job hunt, your case study will be a tangible way to demonstrate your knowledge and skills to potential employers.

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations. Characters and teams ● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day. ● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels. ● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them. ● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends

Three questions will guide the future marketing program:

How do annual members and casual riders use Cyclistic bikes differently? Why would casual riders buy Cyclistic annual memberships? How can Cyclistic use digital media to influence casual riders to become members? Moreno has assigned you the first question to answer: How do annual members and casual rid...

Search
Clear search
Close search
Google apps
Main menu