Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fbf9b8f2d8afc8aad16aadf167ee53777%2FPicture1.png?generation=1695275487466508&alt=media" alt="">
Data Cleaning
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F5ac517a06cd3aff12b58297504902583%2FPicture2.png?generation=1695276101423952&alt=media" alt="">
Convert data types of the required variables
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61c8e665b906ba21d0579d06ab85b028%2FPicture3.png?generation=1695276209705142&alt=media" alt="">
Run libraries dplyr, ggplot2, tidyverse, tidyr
Find out the count of male vs female students
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fb33f152cfb579742aca479923f271b6d%2FPicture4.png?generation=1695276542256981&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F210dd1f7bf238efff7227f5465c77806%2FCount%20of%20Students.jpeg?generation=1695276553831777&alt=media" alt="">
We keep only two columns namely 'Sex' and 'G3' and remove the other columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F148205c33fd7cafc1a0ac05ac205c2b1%2FPicture5.png?generation=1695276691132338&alt=media" alt="">
t=-2.0651 indicates the distance from 0
df = 390.57 is related to the sample size, how many free data points are available for making comparisons
p value = 0.03958 is the probability value and indicates that we can reject the null hypothesis as it is less than that of alpha (0.05). Hence it is statisticall y significant.
95% confidence interval suggests that the true difference in means will lie between -1.85 and -0.04 (95% of time)
We can see the difference in means between the two groups (10.91-9.96) = 0.95
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F02b61a708cb074be592362c39ad33779%2FPicture6.png?generation=1695277010381962&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F769ed45d3ef398e14589e461e3d3fedd%2FHistogram.jpeg?generation=1695277023581085&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F21025562578ad7901abd35319a09579d%2FPicture7.png?generation=1695277093476017&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F0ead3643b8e945b83a257cdb30871143%2FDensity%20plot.jpeg?generation=1695277110253483&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F407028608e7d32361198d15ef854ace2%2FPicture8.png?generation=1695277271891422&alt=media" alt="">
-38 students in total out of 395 have got a score of 0. That is 9.62% students. - Let us check the mean for both groups by removing students who got zeros. - We have created a new data frame called student 2 which includes a total of 357 students with no zero marks
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F06194aa5ed468045ef0d8cdeb82945d5%2FPicture9.png?generation=1695277409031566&alt=media" alt="">
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The official Meta-Kaggle dataset contains the Users.csv file which contains Username, DisplayName, RegisterDate, and PerformanceTier fields but doesn't contain location data of the Kaggle Users. This dataset augments that data with additional country and region information.
I haven't included the username and displayname values on purpose, just the userid to be joined back to the Meta-Kaggle official Users.csv file.
It is possible that some users haven't inputted their details when the scraper went through their accounts and thus have missing data. Another possibility is that users may have updated their info after the scraper went through their accounts, thus resulting in inconsistencies.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data set is collected from online ecommerce site and its very raw and junk data as you can get in industries for your data science project.
Applying your image preprocessing skill on such data will help you to understand the real time problems and challenges in industry projects.
It has some junk, partial as well as multiple t-shirt views in single image.
You can perform different task on this data set like, Beginner - Resize all images to 48 * 48 size - Convert all images to gray scale images Intermediate - Perform image masking on all images - you can also develop classifier to detect given image is t-shirt or not Advance - Try to cluster tshirt images - try if you can cluster based on color - try if you can cluster based on full, partial, nultiple, and junk tshirt images
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains images of clothing items scraped from Carousell, an online marketplace, specifically curated for image classification tasks. It includes a diverse set of classes representing different types of clothing, making it an excellent resource for machine learning and computer vision projects. The dataset is organized into the following 15 classes: - Blazer - Celana_Panjang (Long Pants) - Celana_Pendek (Shorts) - Gaun (Dresses) - Hoodie - Jaket (Jacket) - Jaket_Denim (Denim Jacket) - Jaket_Olahraga (Sports Jacket) - Jeans - Kaos (T-shirt) - Kemeja (Shirt) - Mantel (Coat) - Polo - Rok (Skirt) - Sweter (Sweater)
The images in this dataset represent various styles, textures, and colors, offering a comprehensive resource for training models to recognize and classify clothing categories. It is ideal for tasks such as building fashion recommendation systems, creating virtual try-on applications, or studying visual trends in fashion e-commerce. Whether you are an enthusiast or a professional, this dataset can help explore and experiment with deep learning techniques in the realm of fashion.
Facebook
TwitterThis dataset contains website performance metrics, including response time and throughput, collected from Pingdom and Site24x7. The data has been meticulously labeled by students from FAST NUCES and UMT.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Facebook
TwitterThis dataset was created by SelcukCan
Facebook
TwitterThis dataset was created by Simran Singh
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset is a Mix of various data gathered from across the Kaggle Platform here is the Links of the datasets: 1. https://www.kaggle.com/datasets/fernando2rad/x-ray-lung-diseases-images-9-classes?select=04+Doen%C3%A7as+Pulmonares+Obstrutivas+%28Enfisema%2C+Broncopneumonia%2C+Bronquiectasia%2C+Embolia%29 2. https://www.kaggle.com/datasets/yasserhessein/tuberculosis-chest-x-rays-images/data 3. https://www.kaggle.com/datasets/basitkhan12/covid-and-pneumonia-chest-x-rays 4. https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
The dataset is a Mix of 5 Classes of data 1. Tuberculosis - 5144 images 2. Pneumonia - 5121 3. Normal - 5083 4. Emphysema - 4928 5. COVID-19 - 5523 total This dataset is consist of 25799 data All data are radiography Xrays images.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was collected retrospectively from a single medical center between 2021-2023 years. CT imaging was performed on a 128âslice GE Revolution scanner in the axial plane with a 5 mm slice thickness. MRI was acquired on a GE Signa 1.5 T system using axial diffusionâweighted imaging (DWI) sequences at bâvalues of 0 and 1000 s/mm². All acquisitions followed the hospitalâs standard stroke screening protocol. There are two distinct classes: (1) stroke and (2) control. These classes were clinically verificated with nuerologists and neuroradiologists. The dataset comprises data from 230 participants, with a gender distribution of 113 females and 117 males. Among these participants, 115 were diagnosed with stroke, while the remaining 115 were categorized under the control group. An average of 7-8 cross-sectional images were used for each imaging type. The dataset includes a total of 5,336 CT and MRI (2226 CT + 3110 MR) images, with 2,695 images representing stroke cases and 2,641 images corresponding to control cases. All patient imaging data were fully anonymized before analysis. Identifiers such as name, date of birth, patient ID, and acquisition timestamps were removed from all image headers. We reviewed the dataset for missing images or labels and excluded any cases with incomplete CT or MR series; no imputation was performed. Reference labels were assigned by one neuroradiologist and two emergency medicine specialists, based on clinical reports and followâup data.
Facebook
Twitter"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customerâs attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by kusanovbaiastan
Released under Apache 2.0
Facebook
TwitterBasic dataset for sentiment analysis, and for prediction of stock prices based on news headlines. I tried to look for this dataset on kaggle but couldn't find this anywhere.
(By no means am I the original creator of this dataset. I just found it and uploaded it here since I couldn't find it on kaggle. Please let me know if it is already present on kaggle- if it is, then I'll remove this one at once.)
Anyway, There are label values in the file (Label column): - 0: Stock price DECREASED - 1: Stock price INCREASED due to headlines (or at least didn't decrease)
Encoding info given in file desc. đ
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains biographical information derived from articles on English Wikipedia as it stood in early June 2024. It was created as part of the Structured Contents initiative at Wikimedia Enterprise and is intended for evaluation and research use.
The beta sample dataset is a subset of the Structured Contents Snapshot focusing on people with infoboxes in EN wikipedia; outputted as json files (compressed in tar.gz).
We warmly welcome any feedback you have. Please share your thoughts, suggestions, and any issues you encounter on the discussion page for this dataset here on Kaggle.
Noteworthy Included Fields: - name - title of the article. - identifier - ID of the article. - image - main image representing the article's subject. - description - one-sentence description of the article for quick reference. - abstract - lead section, summarizing what the article is about. - infoboxes - parsed information from the side panel (infobox) on the Wikipedia article. - sections - parsed sections of the article, including links. Note: excludes other media/images, lists, tables and references or similar non-prose sections.
The Wikimedia Enterprise Data Dictionary explains all of the fields in this dataset.
Infoboxes - Compressed: 2GB - Uncompressed: 11GB
Infoboxes + sections + short description - Size of compressed file: 4.12 GB - Size of uncompressed file: 21.28 GB
Article analysis and filtering breakdown: - total # of articles analyzed: 6,940,949 - # people found with QID: 1,778,226 - # people found with Category: 158,996 - people found with Biography Project: 76,150 - Total # of people articles found: 2,013,372 - Total # people articles with infoboxes: 1,559,985 End stats - Total number of people articles in this dataset: 1,559,985 - that have a short description: 1,416,701 - that have an infobox: 1,559,985 - that have article sections: 1,559,921
This dataset includes 235,146 people articles that exist on Wikipedia but aren't yet tagged on Wikidata as instance of:human.
This dataset was originally extracted from the Wikimedia Enterprise APIs on June 5, 2024. The information in this dataset may therefore be out of date. This dataset isn't being actively updated or maintained, and has been shared for community use and feedback. If you'd like to retrieve up-to-date Wikipedia articles or data from other Wikiprojects, get started with Wikimedia Enterprise's APIs
The dataset is built from the Wikimedia Enterprise HTML âsnapshotsâ: https://enterprise.wikimedia.com/docs/snapshot/ and focuses on the Wikipedia article namespace (namespace 0 (main)).
Wikipedia is a human generated corpus of free knowledge, written, edited, and curated by a global community of editors since 2001. It is the largest and most accessed educational resource in history, accessed over 20 billion times by half a billion people each month. Wikipedia represents almost 25 years of work by its community; the creation, curation, and maintenance of millions of articles on distinct topics. This dataset includes the biographical contents of English Wikipedia language editions: English https://en.wikipedia.org/, written by the community.
Wikimedia Enterprise provides this dataset under the assumption that downstream users will adhere to the relevant free culture licenses when the data is reused. In situations where attribution is required, reusers should identify the Wikimedia project from which the content was retrieved as the source of the content. Any attribution should adhere to Wikimediaâs trademark policy (available at https://foundation.wikimedia.org/wiki/Trademark_policy) and visual identity guidelines (ava...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains images of 131 various fruits and vegetables. The original version of this dataset is available here. The original version of the dataset was used for the Fruits-360. Although the original dataset used for hasn't been updated for over 2 years, the dataset on Kaggle has been updated various times providing better images. The dataset should be used for image classification. Do check the Github repository of the source here.
Dataset Properties: (taken from the description in the repository itself) - Total number of images : 90483 - Training set size : 67692 images (one fruit or vegetable per image) - Test set size : 22688 images (20% of total data) - Number of classes : 131 total fruits and vegetables - Filename format : image_index_100.jpg (e.g. 32_100.jpg) or r_image_index_100.jpg (e.g. r_32_100.jpg) or r2_image_index_100.jpg or r3_image_index_100.jpg. "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).
Folder structure - Train : This folder has multiple subfolders labelled as the fruit's/vegetable's name and contains the respective images. These images were used to train the models in the research paper. - Test : This folder has multiple subfolders labelled as the fruit's/vegetable's name and contains the respective images. These images were used to test the models in the research paper.
All credits to the researchers themselves. I made this dataset for my own ease-of-use.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Overview:
This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.
The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.
File: Filename: heart_rate_data.csv File Format: CSV
- Features (Columns):
Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.
Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.
Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.
Resting Heart Rate Before: Description: The individualâs resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individualâs heart rate.
Resting Heart Rate After: Description: The individualâs resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.
Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.
Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.
Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.
Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.
Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.
License: Choose an appropriate open license, such as:
CC BY 4.0 (Attribution 4.0 International).
Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?
Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
I'm currently writing a research paper on AI Detection and its accuracy/effectiveness. While doing so, over the past few months I've generated a large amount of text using various LLMs. This is a dataset/corpus containing all of the data I generated/gathered as well as the text that was generated by various other users.
If you have any questions please post them on the Discussion page or contact me through Kaggle. Generating all of this took many hours of work and a few hundred dollars, all I ask in return is that you credit me if you find this dataset useful in your research. Also, an upvote would mean the world.
Ps. The picture is of my dog, Tessa, who passed away recently. I wasn't sure what to put as the picture so I thought that was better than nothing.
Here are the datasets I used in addition to the text I generated PLEASE UPVOTE THEM!:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Tariq Javed
Released under Apache 2.0
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cora is a widely used node classification dataset. What you are seeing now is the processed version from PyG. Its source file comes from the following paper:
Revisiting Semi-Supervised Learning with Graph Embeddings. Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov. ICML 2016.
Please cite the above paper if these are useful to you.
| Name | #nodes | #edges | #features | #classes |
|---|---|---|---|---|
| Cora | 2708 | 10556 | 1433 | 7 |
For further description of the data please refer to the 'File Description' section below.
This dataset can be downloaded directly from PyG. For the needs of Kaggle evaluation, I simply processed it.
You can run the following code to get the same .csv file:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch_geometric.datasets import Planetoid
dataset = Planetoid('./', 'Cora')
data = dataset[0]
x = data.x
y = data.y
edge_index = data.edge_index
train_mask = data.train_mask
val_mask = data.val_mask
test_mask = data.test_mask
y_train = y[train_mask]
y_val = y[val_mask]
y_test = y[test_mask]
train_index = torch.arange(0, 140)
val_index = torch.arange(140, 640)
test_index = torch.arange(1708, 2708)
y_train = torch.cat((train_index.reshape(-1, 1), y_train.reshape(-1, 1)), dim=1)
y_val = torch.cat((val_index.reshape(-1, 1), y_val.reshape(-1, 1)), dim=1)
y_test = torch.cat((test_index.reshape(-1, 1), y_test.reshape(-1, 1)), dim=1)
x_df = pd.DataFrame(x.numpy())
x_header = ['x' + str(i) for i in range(x_df.shape[1])]
x_df.to_csv('./data/x.csv', index=False, header=x_header)
edge_index_df = pd.DataFrame(edge_index.t().numpy())
edge_index_header = ['source', 'target']
edge_index_df.to_csv('./data/edge_index.csv', index=False, header=edge_index_header)
y_header = ['index', 'label']
y_train_df = pd.DataFrame(y_train.numpy())
y_train_df.to_csv('./data/y_train.csv', index=False, header=y_header)
y_val_df = pd.DataFrame(y_val.numpy())
y_val_df.to_csv('./data/y_val.csv', index=False, header=y_header)
y_test_df = pd.DataFrame(y_test.numpy())
y_test_df.to_csv('./data/y_test.csv', index=False, header=y_header)
â
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Tariq Javed
Released under Apache 2.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fbf9b8f2d8afc8aad16aadf167ee53777%2FPicture1.png?generation=1695275487466508&alt=media" alt="">
Data Cleaning
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F5ac517a06cd3aff12b58297504902583%2FPicture2.png?generation=1695276101423952&alt=media" alt="">
Convert data types of the required variables
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61c8e665b906ba21d0579d06ab85b028%2FPicture3.png?generation=1695276209705142&alt=media" alt="">
Run libraries dplyr, ggplot2, tidyverse, tidyr
Find out the count of male vs female students
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fb33f152cfb579742aca479923f271b6d%2FPicture4.png?generation=1695276542256981&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F210dd1f7bf238efff7227f5465c77806%2FCount%20of%20Students.jpeg?generation=1695276553831777&alt=media" alt="">
We keep only two columns namely 'Sex' and 'G3' and remove the other columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F148205c33fd7cafc1a0ac05ac205c2b1%2FPicture5.png?generation=1695276691132338&alt=media" alt="">
t=-2.0651 indicates the distance from 0
df = 390.57 is related to the sample size, how many free data points are available for making comparisons
p value = 0.03958 is the probability value and indicates that we can reject the null hypothesis as it is less than that of alpha (0.05). Hence it is statisticall y significant.
95% confidence interval suggests that the true difference in means will lie between -1.85 and -0.04 (95% of time)
We can see the difference in means between the two groups (10.91-9.96) = 0.95
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F02b61a708cb074be592362c39ad33779%2FPicture6.png?generation=1695277010381962&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F769ed45d3ef398e14589e461e3d3fedd%2FHistogram.jpeg?generation=1695277023581085&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F21025562578ad7901abd35319a09579d%2FPicture7.png?generation=1695277093476017&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F0ead3643b8e945b83a257cdb30871143%2FDensity%20plot.jpeg?generation=1695277110253483&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F407028608e7d32361198d15ef854ace2%2FPicture8.png?generation=1695277271891422&alt=media" alt="">
-38 students in total out of 395 have got a score of 0. That is 9.62% students. - Let us check the mean for both groups by removing students who got zeros. - We have created a new data frame called student 2 which includes a total of 357 students with no zero marks
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F06194aa5ed468045ef0d8cdeb82945d5%2FPicture9.png?generation=1695277409031566&alt=media" alt="">