23 datasets found
  1. house prices data exploration

    • kaggle.com
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/yvonnegatwiri/house-prices-data-exploration/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    yvonne gatwiri
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by yvonne gatwiri

    Released under Apache 2.0

    Contents

  2. house prices data exploration

    • kaggle.com
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/datasets/yvonnegatwiri/house-prices-data-exploration/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    yvonne gatwiri
    Description

    Dataset

    This dataset was created by yvonne gatwiri

    Released under Apache 2.0

    Contents

  3. Credit EDA Case Study

    • kaggle.com
    Updated Aug 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandish Jani (2021). Credit EDA Case Study [Dataset]. https://www.kaggle.com/datasets/nandishjani/credit-eda-case-study
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nandish Jani
    Description

    Dataset

    This dataset was created by Nandish Jani

    Contents

  4. Employee Turnover Analytics Dataset

    • kaggle.com
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshay Hedau (2023). Employee Turnover Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/akshayhedau/employee-turnover-analytics-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Akshay Hedau
    Description

    Portobello Tech is an app innovator that has devised an intelligent way of predicting employee turnover within the company. It periodically evaluates employees' work details including the number of projects they worked upon, average monthly working hours, time spent in the company, promotions in the last 5 years, and salary level. Data from prior evaluations show the employee’s satisfaction at the workplace. The data could be used to identify patterns in work style and their interest to continue to work in the company. The HR Department owns the data and uses it to predict employee turnover. Employee turnover refers to the total number of workers who leave a company over a certain time period.

  5. Spark Fund Investment Analysis

    • kaggle.com
    zip
    Updated Sep 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranay Prabhat (2019). Spark Fund Investment Analysis [Dataset]. https://www.kaggle.com/pranay969/spark-fund-investment-analysis
    Explore at:
    zip(6260727 bytes)Available download formats
    Dataset updated
    Sep 5, 2019
    Authors
    Pranay Prabhat
    Description

    Project Brief

    You work for Spark Funds, an asset management company. Spark Funds wants to make investments in a few companies. The CEO of Spark Funds wants to understand the global trends in investments so that she can take the investment decisions effectively.

    Business and Data Understanding

    Spark Funds has two minor constraints for investments:

    It wants to invest between 5 to 15 million USD per round of investment

    It wants to invest only in English-speaking countries because of the ease of communication with the companies it would invest in

    For your analysis, consider a country to be English speaking only if English is one of the official languages in that country

    You may use this link: :- https://en.wikipedia.org/wiki/List_of_territorial_entities_where_English_is_an_official_language for a list of countries where English is an official language.

    These conditions will give you sufficient information for your initial analysis. Before getting to specific questions, let’s understand the problem and the data first.

  6. A

    ‘Census County Economically Distressed Areas 2018’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 11, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2011). ‘Census County Economically Distressed Areas 2018’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-census-county-economically-distressed-areas-2018-760f/5de0d3ed/?iid=010-679&v=presentation
    Explore at:
    Dataset updated
    Mar 11, 2011
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Census County Economically Distressed Areas 2018’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/0b289b5e-0507-424d-9f07-f8d2b11b9580 on 27 January 2022.

    --- Dataset description provided by original source is as follows ---

    This is a copy of the statewide Census County GIS Tiger file. It is used to determine if a county is EDA or not by adding ACS (American Community Survey) Median Household Income (MHI) and Population Density data at the county level. The IRWM web based DAC mapping tool uses this GIS layer. Every year this table gets updated after ACS publishes their updated estimates. Created by joining 2016 EDA table to 2010 block groups feature class. The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Block Groups (BGs) are defined before tabulation block delineation and numbering, but are clusters of blocks within the same census tract that have the same first digit of their 4-digit census block number from the same decennial census. For example, Census 2000 tabulation blocks 3001, 3002, 3003,.., 3999 within Census 2000 tract 1210.02 are also within BG 3 within that census tract. Census 2000 BGs generally contained between 600 and 3,000 people, with an optimum size of 1,500 people. Most BGs were delineated by local participants in the Census Bureau's Participant Statistical Areas Program (PSAP). The Census Bureau delineated BGs only where the PSAP participant declined to delineate BGs or where the Census Bureau could not identify any local PSAP participant. A BG usually covers a contiguous area. Each census tract contains at least one BG, and BGs are uniquely numbered within census tract. Within the standard census geographic hierarchy, BGs never cross county or census tract boundaries, but may cross the boundaries of other geographic entities like county subdivisions, places, urban areas, voting districts, congressional districts, and American Indian / Alaska Native / Native Hawaiian areas. BGs have a valid code range of 0 through 9. BGs coded 0 were intended to only include water area, no land area, and they are generally in territorial seas, coastal water, and Great Lakes water areas. For Census 2000, rather than extending a census tract boundary into the Great Lakes or out to the U.S. nautical three-mile limit, the Census Bureau delineated some census tract boundaries along the shoreline or just offshore. The Census Bureau assigned a default census tract number of 0 and BG of 0 to these offshore, water-only areas not included in regularly numbered census tract areas.

    --- Original source retains full ownership of the source dataset ---

  7. HR-attrition-EDA

    • kaggle.com
    Updated Aug 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagar Shee (2020). HR-attrition-EDA [Dataset]. https://www.kaggle.com/winterbreeze/hrattritioneda/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sagar Shee
    Description

    Context

    This dataset is cleaned and ready to deploy for model building.

    Content

    This dataset is for learning purpose and thus is simplified and is without any null values or major skewness.

    Inspiration

    I learned much from Kaggle and the data community and this is my contribution so that flow of knowledge never stops.

  8. f

    Univariate and multivariate cox regression models testing associations...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi (2023). Univariate and multivariate cox regression models testing associations between baseline characteristics and risk of EDA during RTX treatment. [Dataset]. http://doi.org/10.1371/journal.pone.0197415.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Univariate and multivariate cox regression models testing associations between baseline characteristics and risk of EDA during RTX treatment.

  9. A

    ‘Census Block Group Economically Distressed Areas 2018’ analyzed by...

    • analyst-2.ai
    Updated Mar 11, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2011). ‘Census Block Group Economically Distressed Areas 2018’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-census-block-group-economically-distressed-areas-2018-62f0/latest
    Explore at:
    Dataset updated
    Mar 11, 2011
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Census Block Group Economically Distressed Areas 2018’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/ac57065c-1179-421b-968f-e8010700189c on 12 February 2022.

    --- Dataset description provided by original source is as follows ---

    This is a copy of the statewide Census Block Group GIS Tiger file. It is used to determine if a block group (BG) is EDA or not by adding ACS (American Community Survey) Median Household Income (MHI) and Population Density data at the BG level. The IRWM web based DAC mapping tool uses this GIS layer. Every year this table gets updated after ACS publishes their updated estimates. Created by joining 2016 EDA table to 2010 block groups feature class. The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Block Groups (BGs) are defined before tabulation block delineation and numbering, but are clusters of blocks within the same census tract that have the same first digit of their 4-digit census block number from the same decennial census. For example, Census 2000 tabulation blocks 3001, 3002, 3003,.., 3999 within Census 2000 tract 1210.02 are also within BG 3 within that census tract. Census 2000 BGs generally contained between 600 and 3,000 people, with an optimum size of 1,500 people. Most BGs were delineated by local participants in the Census Bureau's Participant Statistical Areas Program (PSAP). The Census Bureau delineated BGs only where the PSAP participant declined to delineate BGs or where the Census Bureau could not identify any local PSAP participant. A BG usually covers a contiguous area. Each census tract contains at least one BG, and BGs are uniquely numbered within census tract. Within the standard census geographic hierarchy, BGs never cross county or census tract boundaries, but may cross the boundaries of other geographic entities like county subdivisions, places, urban areas, voting districts, congressional districts, and American Indian / Alaska Native / Native Hawaiian areas. BGs have a valid code range of 0 through 9. BGs coded 0 were intended to only include water area, no land area, and they are generally in territorial seas, coastal water, and Great Lakes water areas. For Census 2000, rather than extending a census tract boundary into the Great Lakes or out to the U.S. nautical three-mile limit, the Census Bureau delineated some census tract boundaries along the shoreline or just offshore. The Census Bureau assigned a default census tract number of 0 and BG of 0 to these offshore, water-only areas not included in regularly numbered census tract areas.

    --- Original source retains full ownership of the source dataset ---

  10. A

    ‘COVID-19 dataset in Japan’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 dataset in Japan’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-dataset-in-japan-2665/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Japan
    Description

    Analysis of ‘COVID-19 dataset in Japan’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lisphilar/covid19-dataset-in-japan on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    1. Context

    This is a COVID-19 dataset in Japan. This does not include the cases in Diamond Princess cruise ship (Yokohama city, Kanagawa prefecture) and Costa Atlantica cruise ship (Nagasaki city, Nagasaki prefecture). - Total number of cases in Japan - The number of vaccinated people (New/experimental) - The number of cases at prefecture level - Metadata of each prefecture

    Note: Lisphilar (author) uploads the same files to https://github.com/lisphilar/covid19-sir/tree/master/data

    This dataset can be retrieved with CovsirPhy (Python library).

    pip install covsirphy --upgrade
    
    import covsirphy as cs
    data_loader = cs.DataLoader()
    japan_data = data_loader.japan()
    # The number of cases (Total/each province)
    clean_df = japan_data.cleaned()
    # Metadata
    meta_df = japan_data.meta()
    

    Please refer to CovsirPhy Documentation: Japan-specific dataset.

    Note: Before analysing the data, please refer to Kaggle notebook: EDA of Japan dataset and COVID-19: Government/JHU data in Japan. The detailed explanation of the build process is discussed in Steps to build the dataset in Japan. If you find errors or have any questions, feel free to create a discussion topic.

    1.1 Total number of cases in Japan

    covid_jpn_total.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - with symptoms (to 08May2020) / without symptoms (to 08May2020) / unknown (to 08May2020) - discharged - fatal

    The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with mild symptoms (to 08May2020) / severe symptoms / unknown (to 08May2020) - requiring hospitalization, but waiting in hotels or at home (to 08May2020)

    In primary source, some variables were removed on 09May2020. Values are NA in this dataset from 09May2020.

    Manually collected the data from Ministry of Health, Labour and Welfare HP:
    厚生労働省 HP (in Japanese)
    Ministry of Health, Labour and Welfare HP (in English)

    The number of vaccinated people: - Vaccinated_1st: the number of vaccinated persons for the first time on the date - Vaccinated_2nd: the number of vaccinated persons with the second dose on the date - Vaccinated_3rd: the number of vaccinated persons with the third dose on the date

    Data sources for vaccination: - To 09Apr2021: 厚生労働省 HP 新型コロナワクチンの接種実績(in Japanese) - 首相官邸 新型コロナワクチンについて - From 10APr2021: Twitter: 首相官邸(新型コロナワクチン情報)

    1.2 The number of cases at prefecture level

    covid_jpn_prefecture.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - discharged - fatal

    The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with severe symptoms (from 09May2020)

    Using pdf-excel converter, manually collected the data from Ministry of Health, Labour and Welfare HP:
    厚生労働省 HP (in Japanese)
    Ministry of Health, Labour and Welfare HP (in English)

    Note: covid_jpn_prefecture.groupby("Date").sum() does not match covid_jpn_total. When you analyse total data in Japan, please use covid_jpn_total data.

    1.3 Metadata of each prefecture

    covid_jpn_metadata.csv - Population (Total, Male, Female): 厚生労働省 厚生統計要覧(2017年度)第1-5表 - Area (Total, Habitable): Wikipedia 都道府県の面積一覧 (2015)

    2. Acknowledgements

    To create this dataset, edited and transformed data of the following sites was used.

    厚生労働省 Ministry of Health, Labour and Welfare, Japan:
    厚生労働省 HP (in Japanese)
    Ministry of Health, Labour and Welfare HP (in English) 厚生労働省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

    国土交通省 Ministry of Land, Infrastructure, Transport and Tourism, Japan: 国土交通省 HP (in Japanese) 国土交通省 HP (in English) 国土交通省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

    Code for Japan / COVID-19 Japan: Code for Japan COVID-19 Japan Dashboard (CC BY 4.0) COVID-19 Japan 都道府県別 感染症病床数 (CC BY)

    Wikipedia: Wikipedia

    LinkData: LinkData (Public Domain)

    Inspiration

    1. Changes in number of cases over time
    2. Percentage of patients without symptoms / mild or severe symptoms
    3. What to do next to prevent outbreak

    License and how to cite

    Kindly cite this dataset under CC BY-4.0 license as follows. - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan, or - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, Kaggle Dataset, https://www.kaggle.com/lisphilar/covid19-dataset-in-japan

    --- Original source retains full ownership of the source dataset ---

  11. Baseline characteristics at first RTX infusion in all MS, RRMS and PMS...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi (2023). Baseline characteristics at first RTX infusion in all MS, RRMS and PMS patients. [Dataset]. http://doi.org/10.1371/journal.pone.0197415.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Baseline characteristics at first RTX infusion in all MS, RRMS and PMS patients.

  12. Detailed Analysis on campus recruitment

    • kaggle.com
    Updated Oct 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BANDI SAMUEL 2039426 (2020). Detailed Analysis on campus recruitment [Dataset]. https://www.kaggle.com/bandisamuel2039426/detailed-analysis-on-campus-recruitment/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2020
    Dataset provided by
    Kaggle
    Authors
    BANDI SAMUEL 2039426
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This data set consists of Placement data, of students in a XYZ campus. It includes secondary and higher secondary school percentage and specialisation. It also includes degree specialisation, type and Work experience and salary offers to the placed students we will Analyse what factors are playing a major role in order to select a candidate for job recruitment

  13. Superstore Orders Analysis-Files

    • kaggle.com
    Updated Feb 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pradyumna Reddy (2021). Superstore Orders Analysis-Files [Dataset]. https://www.kaggle.com/dommatap/analysisfiles/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Pradyumna Reddy
    Description

    Dataset

    This dataset was created by Pradyumna Reddy

    Contents

  14. Toy Dataset

    • kaggle.com
    Updated Dec 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlo Lepelaars (2018). Toy Dataset [Dataset]. https://www.kaggle.com/datasets/carlolepelaars/toy-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 10, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carlo Lepelaars
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    A fictional dataset for exploratory data analysis (EDA) and to test simple prediction models.

    This toy dataset features 150000 rows and 6 columns.

    Columns

    Note: All data is fictional. The data has been generated so that their distributions are convenient for statistical analysis.

    Number: A simple index number for each row

    City: The location of a person (Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego and Austin)

    Gender: Gender of a person (Male or Female)

    Age: The age of a person (Ranging from 25 to 65 years)

    Income: Annual income of a person (Ranging from -674 to 177175)

    Illness: Is the person Ill? (Yes or No)

    Acknowledgements

    Stock photo by Mika Baumeister on Unsplash.

  15. COVID 19 Dataset

    • kaggle.com
    Updated Sep 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Gupta (2020). COVID 19 Dataset [Dataset]. https://www.kaggle.com/rahulgupta21/datahub-covid19/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rahul Gupta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Coronavirus disease 2019 (COVID-19) time series listing confirmed cases, reported deaths and reported recoveries. Data is disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the over 118,000 cases of the Coronavirus illness in over 110 countries and territories around the world at the time.

    This dataset includes time series data tracking the number of people affected by COVID-19 worldwide, including:

    confirmed tested cases of Coronavirus infection the number of people who have reportedly died while sick with Coronavirus the number of people who have reportedly recovered from it

    Content

    Data is in CSV format and updated daily. It is sourced from this upstream repository maintained by the amazing team at Johns Hopkins University Center for Systems Science and Engineering (CSSE) who have been doing a great public service from an early point by collating data from around the world.

    We have cleaned and normalized that data, for example tidying dates and consolidating several files into normalized time series. We have also added some metadata such as column descriptions and data packaged it.

  16. IMDB Dataset & Dictionary

    • kaggle.com
    Updated Feb 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Kapoor (2021). IMDB Dataset & Dictionary [Dataset]. https://www.kaggle.com/kapoorshivam/imdb-dataset-dictionary/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shivam Kapoor
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    We all love movies! I remember watching my first movie with my family when I was 5 and 3 years later, I still love movies. But have you ever wondered how some people rate movies as good or bad, awesome or mehh! That's correct. Different people have different perspectives on how they like or dislike movies. To help us select from a plethora of movie option out there, IMDB platform provides us honest reviews by the people for the people.

    Long story short, this assignment will take you through different aspects of how a movie is reviewed by different people from across the globe based on their star cast, genre, story length and many more aspects.

    So here is what you need to do! Few points: 1. Download the dataset & the dictionary that will help you learn the different columns in the dataset 2. Start exploring the data by performing EDA (wiki what’s EDA, if you are a dummy like I was initially) 3. Get back to this notebook to check what all I did for exploring through the data and then follow the subtasks & checkpoints!

    Simple? Isn’t it! Do complete the exercise & let me know in the comments if you found this exercise helpful? There’s always a scope for improvement. Tell me what more could have been added to this notebook! Hope you’ll have a good time exploring data.

  17. Covid_19 Full Explo. Data Analysis by DaudKhan

    • kaggle.com
    Updated Aug 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad sardar daud khan (2022). Covid_19 Full Explo. Data Analysis by DaudKhan [Dataset]. https://www.kaggle.com/datasets/daudkhan2023/covid-19-full-eda/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad sardar daud khan
    Description

    Dataset

    This dataset was created by Muhammad sardar daud khan

    Contents

  18. Titanic Dataset - EDA & Logistic Regression

    • kaggle.com
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RabbiTheAnalyst (2025). Titanic Dataset - EDA & Logistic Regression [Dataset]. https://www.kaggle.com/datasets/mdrabbiali/titanic-data-set/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    RabbiTheAnalyst
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

    Objective:

    1. Survival Prediction: To build a logistic regression model that accurately predicts the survival of passengers based on features such as age, gender, passenger class, and number of siblings/spouses aboard.

    2. Data Cleaning and Preprocessing:To perform data cleaning by handling missing values, removing unnecessary columns, and encoding categorical variables to prepare the dataset for analysis.

    3. Exploratory Data Analysis (EDA): To conduct a thorough exploratory data analysis to visualize survival rates and identify patterns based on various factors like gender, passenger class, and embarked location.

    4. Feature Importance Analysis: To analyze the correlation between different features and their impact on survival rates, identifying which factors are the most significant predictors of survival.

    5. Model Evaluation: To evaluate the performance of the logistic regression model using accuracy scores and classification reports, ensuring that the model generalizes well to unseen data.

    6. ROC Curve Analysis: To create a ROC curve to assess the trade-off between the true positive rate and false positive rate, providing insights into the model's ability to distinguish between survivors and non-survivors.

    7. Insights and Recommendations: To derive insights from the analysis that could inform future safety measures or policies related to passenger safety in maritime travel.

  19. College Student Placement Factors Dataset

    • kaggle.com
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Islam007
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📘 College Student Placement Dataset

    A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

    📄 Dataset Description

    This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

    The dataset is ideal for:

    • Predictive modeling of placement outcomes
    • Educational exercises in classification
    • Feature importance analysis
    • End-to-end machine learning projects

    📊 Columns Description

    Column NameDescription
    College_IDUnique ID of the college (e.g., CLG0001 to CLG0100)
    IQStudent’s IQ score (normally distributed around 100)
    Prev_Sem_ResultGPA from the previous semester (range: 5.0 to 10.0)
    CGPACumulative Grade Point Average (range: ~5.0 to 10.0)
    Academic_PerformanceAnnual academic rating (scale: 1 to 10)
    Internship_ExperienceWhether the student has completed any internship (Yes/No)
    Extra_Curricular_ScoreInvolvement in extracurriculars (score from 0 to 10)
    Communication_SkillsSoft skill rating (scale: 1 to 10)
    Projects_CompletedNumber of academic/technical projects completed (0 to 5)
    PlacementFinal placement result (Yes = Placed, No = Not Placed)

    🎯 Target Variable

    • Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

    🧠 Use Cases

    • 📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
    • 🔍 Exploratory Data Analysis (EDA)
    • 🎯 Feature Engineering and Selection
    • 🧪 Model Evaluation Practice
    • 👩‍🏫 Academic Projects & Capstone Use

    📦 Dataset Size

    • Rows: 10,000
    • Columns: 10
    • File Format: .csv

    📚 Context

    This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

    📜 License

    MIT

    🔗 Source

    Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

  20. Zomato Dataset

    • kaggle.com
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abu Awaish (2025). Zomato Dataset [Dataset]. https://www.kaggle.com/datasets/abuawaish/zomato-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abu Awaish
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Zomato Dataset

    Overview

    This dataset contains information about various restaurants, including their ratings, cuisine types, pricing, and availability of services like online ordering and table booking.

    • Total Entries (Restaurants): 7,105
    • Total Columns (Features): 10

    Column Descriptions

    Column NameDescription
    restaurant nameName of the restaurant.
    restaurant typeType of restaurant (e.g., Quick Bites, Cafe, Casual Dining).
    rate (out of 5)Average rating of the restaurant (out of 5).
    num of ratingsNumber of people who have rated the restaurant.
    avg cost (two people)Average cost for two people in local currency.
    online_orderWhether online ordering is available (Yes/No).
    table bookingWhether table booking is available (Yes/No).
    cuisines typeTypes of cuisines served at the restaurant (e.g., Fast Food, Chinese, BBQ).
    areaLocation area of the restaurant.
    local addressSpecific address of the restaurant.

    Potential Use Cases

    • Food Recommendation System – Suggest restaurants based on cuisine, ratings, or cost.
    • Customer Behavior Analysis – Identify trends in online orders, table bookings, and preferred cuisines.
    • Geographical Insights – Analyze restaurant distribution across different areas.
    • Price vs. Rating Analysis – Determine if higher prices correlate with better ratings.

    How to Use This Dataset

    1. Data Cleaning – Handle missing values, remove duplicate entries.
    2. Data Analysis – Perform exploratory data analysis (EDA) to identify trends.
    3. Visualization – Create plots and graphs to understand restaurant trends.
    4. Machine Learning – Use the data for predictive modeling, such as rating prediction.

    Note: This dataset may contain missing values or inconsistencies that require preprocessing before analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/yvonnegatwiri/house-prices-data-exploration/discussion
Organization logo

house prices data exploration

Exploratory Data Analysis (EDA)

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
yvonne gatwiri
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Contents

Search
Clear search
Close search
Google apps
Main menu