4 datasets found
  1. Netflix Movies DB cleaning and Visualzation-Excel

    • kaggle.com
    zip
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diane-Her (2024). Netflix Movies DB cleaning and Visualzation-Excel [Dataset]. https://www.kaggle.com/datasets/dianeher/netflix-movies-db-cleaning-and-visualzation-excel
    Explore at:
    zip(1799836 bytes)Available download formats
    Dataset updated
    Jan 8, 2024
    Authors
    Diane-Her
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Cleaning and Visualization of Netflix Movies DB using Power Query and Power Pivot of Microsoft Excel. The purpose of the data presented on the Dashboard is not to deeply analyze the information itself but to showcase the extracted and transformed information using Power Query, Power DAX, and Pivot Tables. It involves creating charts with useful data such as the year, main category, or content type without removing any rows containing valuable information. For a more extensive analysis of the data, without being affected by cells with added labels like 'No Data,' it's necessary to remove those rows. Therefore, this analysis will be conducted by cleaning the data using SQL.

  2. f

    Three Excel data show the experimental results detail from Common cuckoo...

    • rs.figshare.com
    zip
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Longwu Wang; Yuhan Zhang; Wei Liang; Anders Pape Møller (2024). Three Excel data show the experimental results detail from Common cuckoo females remove more conspicuous eggs during parasitism [Dataset]. http://doi.org/10.6084/m9.figshare.13489485.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 20, 2024
    Dataset provided by
    The Royal Society
    Authors
    Longwu Wang; Yuhan Zhang; Wei Liang; Anders Pape Møller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Avian obligate brood parasites gain an advantage by removing the eggs of the cuckoos who have already visited the nest, which can increase the chances of survival for their offspring. Conversely, to prevent their eggs from being picked up by the next parasitic cuckoo, they need to take some precautions. Egg mimicry and egg crypsis are two alternative strategies to prevent the parasitized egg from being picked up by another parasitic cuckoo. Here, we tested whether the egg crypsis hypothesis has a preventative effect when common cuckoos (Cuculus canorus) parasitize their Oriental reed warbler (Acrocephalus orientalis) hosts. We designed two experimental groups with different crypsis effects to induce common cuckoos to lay eggs and observed whether the cuckoos selectively picked up the experimental eggs with low crypsis levels in the process of parasitism. Our results supported the egg crypsis hypothesis; the observed cuckoos significantly preferred to select the more obvious white model eggs. This shows that even in an open nest, eggs that are adequately hidden can also be protected from being picked up by cuckoo females during parasitism so as to increase the survival chance of their own parasitic eggs.

  3. 2022 Bikeshare Data -Reduced File Size -All Months

    • kaggle.com
    zip
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kendall Marie (2023). 2022 Bikeshare Data -Reduced File Size -All Months [Dataset]. https://www.kaggle.com/datasets/kendallmarie/2022-bikeshare-data-all-months-combined
    Explore at:
    zip(98884 bytes)Available download formats
    Dataset updated
    Mar 8, 2023
    Authors
    Kendall Marie
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is a condensed version of the raw data obtained through the Google Data Analytics Course, made available by Lyft and the City of Chicago under this license (https://ride.divvybikes.com/data-license-agreement).

    I originally did my study in another platform, and the original files were too large to upload to Posit Cloud in full. Each of the 12 monthly files contained anywhere from 100k to 800k rows. Therefore, I decided to reduce the number of rows drastically by performing grouping, summaries, and thoughtful omissions in Excel for each csv file. What I have uploaded here is the result of that process.

    Data is grouped by: month, day, rider_type, bike_type, and time_of_day. total_rides represent the sum of the data in each grouping as well as the total number of rows that were combined to make the new summarized row, avg_ride_length is the calculated average of all data in each grouping.

    Be sure that you use weighted averages if you want to calculate the mean of avg_ride_length for different subgroups as the values in this file are already averages of the summarized groups. You can include the total_rides value in your weighted average calculation to weigh properly.

    9 Columns:

    date - year, month, and day in date format - includes all days in 2022 day_of_week - Actual day of week as character. Set up a new sort order if needed. rider_type - values are either 'casual', those who pay per ride, or 'member', for riders who have annual memberships. bike_type - Values are 'classic' (non-electric, traditional bikes), or 'electric' (e-bikes). time_of_day - this divides the day into 6 equal time frames, 4 hours each, starting at 12AM. Each individual ride was placed into one of these time frames using the time they STARTED their rides, even if the ride was long enough to end in a later time frame. This column was added to help summarize the original dataset. total_rides - Count of all individual rides in each grouping (row). This column was added to help summarize the original dataset. avg_ride_length - The calculated average of all rides in each grouping (row). Look to total_rides to know how many original rides length values were included in this average. This column was added to help summarize the original dataset. min_ride_length - Minimum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset. max_ride_length - Maximum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset.

    Please note: the time_of_day column has inconsistent spacing. Use mutate(time_of_day = gsub(" ", "", time_of _day)) to remove all spaces.

    Revisions

    Below is the list of revisions I made in Excel before uploading the final csv files to the R environment:

    • Deleted station location columns and lat/long as much of this data was already missing.

    • Deleted ride id column since each observation was unique and I would not be joining with another table on this variable.

    • Deleted rows pertaining to "docked bikes" since there were no member entries for this type and I could not compare member vs casual rider data. I also received no information in the project details about what constitutes a "docked" bike.

    • Used ride start time and end time to calculate a new column called ride_length (by subtracting), and deleted all rows with 0 and 1 minute results, which were explained in the project outline as being related to staff tasks rather than users. An example would be taking a bike out of rotation for maintenance.

    • Placed start time into a range of times (time_of_day) in order to group more observations while maintaining general time data. time_of_day now represents a time frame when the bike ride BEGAN. I created six 4-hour time frames, beginning at 12AM.

    • Added a Day of Week column, with Sunday = 1 and Saturday = 7, then changed from numbers to the actual day names.

    • Used pivot tables to group total_rides, avg_ride_length, min_ride_length, and max_ride_length by date, rider_type, bike_type, and time_of_day.

    • Combined into one csv file with all months, containing less than 9,000 rows (instead of several million)

  4. ⚽️ EA FC25 Player Database

    • kaggle.com
    zip
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). ⚽️ EA FC25 Player Database [Dataset]. https://www.kaggle.com/datasets/mexwell/ea-fc25-player-database
    Explore at:
    zip(1056188 bytes)Available download formats
    Dataset updated
    Sep 27, 2024
    Authors
    mexwell
    Description

    This data set contains all players of the video game EA Sports FC 25. The data was scraped from the website https://wefut.com/player-database using a scraper. The output format of the scraper is a little messy. For this reason, this data set should also serve as a guide on how to clean up unclean data with Python and Pandas so that detailed analyses can then be made.

    The data set contains 2 CSV files. One for the players and one for the goalkeepers. The column names were also created incorrectly by scraping. I have revised these in the column description.

    Another problem is that the number of columns has virtually doubled. There are odd and even columns. Every second odd column is empty, but the even column is filled with a player. It is therefore not possible to simply delete all rows where, for example, the odd src column has no value, as this row still contains information about a player. This should be the first transformation performed on the data set.

    Only base statistics are in this dataset

    Acknowledgement

    Foto von Fauzan Saari auf Unsplash

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Diane-Her (2024). Netflix Movies DB cleaning and Visualzation-Excel [Dataset]. https://www.kaggle.com/datasets/dianeher/netflix-movies-db-cleaning-and-visualzation-excel
Organization logo

Netflix Movies DB cleaning and Visualzation-Excel

Cleaning and use of available information before deleting data of interest

Explore at:
zip(1799836 bytes)Available download formats
Dataset updated
Jan 8, 2024
Authors
Diane-Her
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Cleaning and Visualization of Netflix Movies DB using Power Query and Power Pivot of Microsoft Excel. The purpose of the data presented on the Dashboard is not to deeply analyze the information itself but to showcase the extracted and transformed information using Power Query, Power DAX, and Pivot Tables. It involves creating charts with useful data such as the year, main category, or content type without removing any rows containing valuable information. For a more extensive analysis of the data, without being affected by cells with added labels like 'No Data,' it's necessary to remove those rows. Therefore, this analysis will be conducted by cleaning the data using SQL.

Search
Clear search
Close search
Google apps
Main menu