5 datasets found
  1. Bellabeat - Case Study (Google Career Certificate)

    • kaggle.com
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Loop (2024). Bellabeat - Case Study (Google Career Certificate) [Dataset]. https://www.kaggle.com/datasets/alexandraloop/bellabeat-case-study-google-career-certificate/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexandra Loop
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Analyst: Alexandra Loop Date: 12/02/2024

    Business Task:

    Question to be Answered : - What are trends in non-Bellabeat smart device usage? - What do these trends suggest for Bellabeat customers? - How could these trends help influence Bellabeat marketing strategy?

    Description of Data Sources:

    Data Set to be studied: FitBit Fitness Tracker Data: Pattern Recognition with tracker data: Improve Your Overall Health

    Data privacy: Data was sourced from a public dataset available on Kaggle. Information has been anonymized prior to being posted online.
    

    Bias: Due to the degree of anonymity in this study, the only demographic data available in this study is weight, and other cultural differences or lifestyle requirements cannot be accounted for. The sample size is quite small. The time period of the study is only a month so the observer effect could conceivably still be influencing the sample groups. We also have no information on the weather in the region studied. April and May are very variable months in terms of accessible outdoor activities.

    Process:

    Cleaning Process: After going through the data to find duplicates, whitespace, and nulls, I have determined that this set of data has been well-cleaned and already aggregated into several reasonably sized spreadsheets.

    Trim: No issues found

    Consistent length ID: No issues found

    Irrelevant columns: In WLI_M the fat column is not consistently filled in so it is not productive to use it in analysis Sedentary_active_distance was mostly filled with nulls and could confuse the data I have removed the columns

    Irrelevant Rows: 77 rows in daily_Activity_merged had 0s across the board. As there is little chance that someone would take zero steps I decided to interpret these days as ones where people did not put on the fitbit. As such they are irrelevant rows. Removed 77 columns. 85 rows in daily_intensities_merged registered 0 minutes of sedentary activity, which I do not believe to be possible. Row 241 logged 2 minutes of sedentary activity. I have determined it to be unusable. Row 322 likewise does not add up to a day’s minutes and has been deleted. Removed 85 columns 7 rows had 1440 sedentary minutes, which I have determined to be time on but not used. Implication of the presence noted.

    Scientifically debunked information: BMI as a measurement has been determined to be problematic on many lines, it misrepresents non-white people who have different healthy body types, does not account for muscle mass or scoliosis, has been known to change definitions in accordance with business interests rather than health data, and was never meant to be used as a measure of individual health. I have removed the BMI column from the Weight Log Info chart.

    Cleaning Process 1: I have elected to see what can be found in the data as it was organized by the providers first.
    Cleaning Process 2: I calculated and removed rows where the participants did not put on the fitbit. These rows were removed, and the implications of their presence have been noted. Found Averages, Minimum, and Maximum Values of Steps, distance, types of active minutes, and calories. Found the sum of all kinds of minutes documented to check for inconsistencies. Found the difference between total minutes and a full 1440 minutes. I tried to make a pie chart to convey the average minutes of activity, and so created a duplicate dataset to trim down and remove misleading data caused by different inputs.

    Analysis:

    Observations: On average, the participants do not seem interested in moderate physical activity as it was the category with the fewest number of active minutes. Perhaps advertise the effectiveness of low impact workouts. Very few participants volunteered their weights, but none of them lost weight. The person with the highest weight volunteered it only once near the beginning. Given evidence from the Health At Every Size movement, we cannot deny the possibility that having to be weight conscious could have had negative effects on this individual. I would suggest that weight would be a counterproductive focus for our marketing campaign as it would make heavier people less likely to want to participate, and any claims of weight loss would be statistically unfounded, and open us up to false advertising lawsuits. Fully half of the participants had days where they did not put on their fitbit at all during the day. For a total number of 77-84 lost days of data, meaning that on average participants who did not wear their fitbit daily lost 5 days of data, though of course some lost significantly more. I would suggest focusing on creating a biometric tracker that is comfortable and rarely needs to be charged so that people will gain more reliable resources from it. 400 full days of data are recorded, meaning that the participants did not take the device off to sleep, shower, or swim. 280 more have 16...

  2. f

    Travel time to cities and ports in the year 2015

    • figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Nelson (2023). Travel time to cities and ports in the year 2015 [Dataset]. http://doi.org/10.6084/m9.figshare.7638134.v4
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Andy Nelson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5

    If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD

    The following text is a summary of the information in the above Data Descriptor.

    The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.

    The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.

    These maps represent a unique global representation of physical access to essential services offered by cities and ports.

    The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).

    travel_time_to_ports_x (x ranges from 1 to 5)

    The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.

    Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes

    Data type Byte (16 bit Unsigned Integer)

    No data value 65535

    Flags None

    Spatial resolution 30 arc seconds

    Spatial extent

    Upper left -180, 85

    Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)

    Temporal resolution 2015

    Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.

    Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.

    The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.

    Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points

    The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).

    Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.

    Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.

    This process and results are included in the validation zip file.

    Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.

    The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.

    The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.

    The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.

  3. Mobile internet users worldwide 2020-2029

    • statista.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.

  4. Mobile internet usage reach in North America 2020-2029

    • statista.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Mobile internet usage reach in North America 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The population share with mobile internet access in North America was forecast to increase between 2024 and 2029 by in total 2.9 percentage points. This overall increase does not happen continuously, notably not in 2028 and 2029. The mobile internet penetration is estimated to amount to 84.21 percent in 2029. Notably, the population share with mobile internet access of was continuously increasing over the past years.The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the population share with mobile internet access in countries like Caribbean and Europe.

  5. Mobile internet penetration in Europe 2024, by country

    • statista.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Mobile internet penetration in Europe 2024, by country [Dataset]. https://www.statista.com/topics/779/mobile-internet/
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    Switzerland is leading the ranking by population share with mobile internet access , recording 95.06 percent. Following closely behind is Ukraine with 95.06 percent, while Moldova is trailing the ranking with 46.83 percent, resulting in a difference of 48.23 percentage points to the ranking leader, Switzerland. The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alexandra Loop (2024). Bellabeat - Case Study (Google Career Certificate) [Dataset]. https://www.kaggle.com/datasets/alexandraloop/bellabeat-case-study-google-career-certificate/code
Organization logo

Bellabeat - Case Study (Google Career Certificate)

Usage Observations and Suggestions for Wearable Fitness Tech

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alexandra Loop
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Analyst: Alexandra Loop Date: 12/02/2024

Business Task:

Question to be Answered : - What are trends in non-Bellabeat smart device usage? - What do these trends suggest for Bellabeat customers? - How could these trends help influence Bellabeat marketing strategy?

Description of Data Sources:

Data Set to be studied: FitBit Fitness Tracker Data: Pattern Recognition with tracker data: Improve Your Overall Health

Data privacy: Data was sourced from a public dataset available on Kaggle. Information has been anonymized prior to being posted online.

Bias: Due to the degree of anonymity in this study, the only demographic data available in this study is weight, and other cultural differences or lifestyle requirements cannot be accounted for. The sample size is quite small. The time period of the study is only a month so the observer effect could conceivably still be influencing the sample groups. We also have no information on the weather in the region studied. April and May are very variable months in terms of accessible outdoor activities.

Process:

Cleaning Process: After going through the data to find duplicates, whitespace, and nulls, I have determined that this set of data has been well-cleaned and already aggregated into several reasonably sized spreadsheets.

Trim: No issues found

Consistent length ID: No issues found

Irrelevant columns: In WLI_M the fat column is not consistently filled in so it is not productive to use it in analysis Sedentary_active_distance was mostly filled with nulls and could confuse the data I have removed the columns

Irrelevant Rows: 77 rows in daily_Activity_merged had 0s across the board. As there is little chance that someone would take zero steps I decided to interpret these days as ones where people did not put on the fitbit. As such they are irrelevant rows. Removed 77 columns. 85 rows in daily_intensities_merged registered 0 minutes of sedentary activity, which I do not believe to be possible. Row 241 logged 2 minutes of sedentary activity. I have determined it to be unusable. Row 322 likewise does not add up to a day’s minutes and has been deleted. Removed 85 columns 7 rows had 1440 sedentary minutes, which I have determined to be time on but not used. Implication of the presence noted.

Scientifically debunked information: BMI as a measurement has been determined to be problematic on many lines, it misrepresents non-white people who have different healthy body types, does not account for muscle mass or scoliosis, has been known to change definitions in accordance with business interests rather than health data, and was never meant to be used as a measure of individual health. I have removed the BMI column from the Weight Log Info chart.

Cleaning Process 1: I have elected to see what can be found in the data as it was organized by the providers first.
Cleaning Process 2: I calculated and removed rows where the participants did not put on the fitbit. These rows were removed, and the implications of their presence have been noted. Found Averages, Minimum, and Maximum Values of Steps, distance, types of active minutes, and calories. Found the sum of all kinds of minutes documented to check for inconsistencies. Found the difference between total minutes and a full 1440 minutes. I tried to make a pie chart to convey the average minutes of activity, and so created a duplicate dataset to trim down and remove misleading data caused by different inputs.

Analysis:

Observations: On average, the participants do not seem interested in moderate physical activity as it was the category with the fewest number of active minutes. Perhaps advertise the effectiveness of low impact workouts. Very few participants volunteered their weights, but none of them lost weight. The person with the highest weight volunteered it only once near the beginning. Given evidence from the Health At Every Size movement, we cannot deny the possibility that having to be weight conscious could have had negative effects on this individual. I would suggest that weight would be a counterproductive focus for our marketing campaign as it would make heavier people less likely to want to participate, and any claims of weight loss would be statistically unfounded, and open us up to false advertising lawsuits. Fully half of the participants had days where they did not put on their fitbit at all during the day. For a total number of 77-84 lost days of data, meaning that on average participants who did not wear their fitbit daily lost 5 days of data, though of course some lost significantly more. I would suggest focusing on creating a biometric tracker that is comfortable and rarely needs to be charged so that people will gain more reliable resources from it. 400 full days of data are recorded, meaning that the participants did not take the device off to sleep, shower, or swim. 280 more have 16...

Search
Clear search
Close search
Google apps
Main menu