100+ datasets found

Gemma-Data Science Agent- Instruct- Dataset
kaggle.com
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ian cecil akoto (2024). Gemma-Data Science Agent- Instruct- Dataset [Dataset]. https://www.kaggle.com/datasets/ianakoto/gemma-data-science-agent-instruct-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ian cecil akoto
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview This dataset contains question-answer pairs with context extracted from Kaggle solution write-ups and discussion forums. The dataset was created to facilitate fine-tuning Gemma, an AI model, for data scientist assistant tasks such as question answering and providing data science assistance.

Dataset Details Columns: Question: The question generated based on the context extracted from Kaggle solution write-ups and discussion forums. Answer: The corresponding answer to the generated question. Context: The context extracted from Kaggle solution write-ups and discussion forums, which serves as the basis for generating questions and answers. Subtitle: Subtitle or additional information related to the Kaggle competition or topic. Title: Title of the Kaggle competition or topic. Sources and Inspiration

Sources:

Meta Kaggle: The dataset was sourced from Meta Kaggle, an official Kaggle platform where users discuss competitions, kernels, datasets, and more. Kaggle Solution Write-ups: Solution write-ups submitted by Kaggle users were utilized as a primary source of context for generating questions and answers. Discussion Forums: Discussion threads on Kaggle forums were used to gather additional insights and context for the dataset. Inspiration:

The dataset was inspired by the need for a specialized dataset tailored for fine-tuning Gemma, an AI model designed for data scientist assistant tasks. The goal was to create a dataset that captures the essence of real-world data science problems discussed on Kaggle, enabling Gemma to provide accurate and relevant assistance to data scientists and Kaggle users. Dataset Specifics Total Records: [Specify the total number of question-answer pairs in the dataset] Format: CSV (Comma Separated Values) Size: [Specify the size of the dataset in MB or GB] License: [Specify the license under which the dataset is distributed, e.g., CC BY-SA 4.0] Download Link: [Provide a link to download the dataset] Acknowledgments We acknowledge Kaggle and its community for providing valuable data science resources and discussions that contributed to the creation of this dataset. We appreciate the efforts of Gemma and Langchain in fine-tuning AI models for data scientist assistant tasks, enabling enhanced productivity and efficiency in the field of data science.
The AI, ML, Data Science Salary (2020- 2025)
kaggle.com
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samith Chimminiyan (2025). The AI, ML, Data Science Salary (2020- 2025) [Dataset]. https://www.kaggle.com/datasets/samithsachidanandan/the-global-ai-ml-data-science-salary-for-2025
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Samith Chimminiyan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This Dataset containes the details of the AI, ML, Data Science Salary (2020- 2025). Salary data is in USD and recalculated at its average fx rate during the year for salaries entered in other currencies.

The data is processed and updated on a weekly basis so the rankings may change over time during the year.

Attribute Information

work_year: The year the salary was paid.

experience_level: The experience level in the job during the year with the following possible values: EN Entry-level / Junior MI Mid-level / Intermediate SE Senior-level / Expert EX Executive-level / Director

employment_type: The type of employement for the role: PT Part-time FT Full-time CT Contract FL Freelance

job_title: The role worked in during the year.

salary: The total gross salary amount paid.

salary_currency: The currency of the salary paid as an ISO 4217 currency code.

salary_in_usd: The salary in USD (FX rate divided by avg. USD rate of respective year) via statistical data from the BIS and central banks.

employee_residence: Employee's primary country of residence in during the work year as an ISO 3166 country code.

remote_ratio : The overall amount of work done remotely, possible values are as follows: 0 No remote work (less than 20%) 50 Partially remote/hybird 100 Fully remote (more than 80%)

company_location: The country of the employer's main office or contracting branch as an ISO 3166 country code.

company_size: The average number of people that worked for the company during the year: S less than 50 employees (small) M 50 to 250 employees (medium) L more than 250 employees (large)

Acknowledgements

https://aijobs.net/

Photo by Anastassia Anufrieva on Unsplash
P
DSEval-Kaggle Dataset
paperswithcode.com
Updated Feb 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-Kaggle Dataset [Dataset]. https://paperswithcode.com/dataset/dseval
Explore at:
Dataset updated
Feb 26, 2024
Authors
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren
Description
In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.
A
‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-top-1000-kaggle-datasets-658b/b992f64b/?iid=004-457&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Top 1000 Kaggle Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/notkrishna/top-1000-kaggle-datasets on 28 January 2022.

--- Dataset description provided by original source is as follows ---

From wiki

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

Source: Kaggle

--- Original source retains full ownership of the source dataset ---
Meta Kaggle Code
kaggle.com
zip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(147568851439 bytes)Available download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Analyzing Data Science Salaries
kaggle.com
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raghav Khandelwal (2024). Analyzing Data Science Salaries [Dataset]. https://www.kaggle.com/datasets/raghavkhandelwal65/analyzing-data-science-salaries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raghav Khandelwal
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Raghav Khandelwal

Released under Apache 2.0

Contents
A
‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hr-analytics-job-change-of-data-scientists-db67/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘HR Analytics: Job Change of Data Scientists’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context and Content

A company which is active in Big Data and Data Science wants to hire data scientists among people who successfully pass some courses which conduct by the company. Many people signup for their training. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Information related to demographics, education, experience are in hands from candidates signup and enrollment.

This dataset designed to understand the factors that lead a person to leave current job for HR researches too. By model(s) that uses the current credentials,demographics,experience data you will predict the probability of a candidate to look for a new job or will work for the company, as well as interpreting affected factors on employee decision.

The whole data divided to train and test . Target isn't included in test but the test target values data file is in hands for related tasks. A sample submission correspond to enrollee_id of test set provided too with columns : enrollee _id , target

Note: - The dataset is imbalanced. - Most features are categorical (Nominal, Ordinal, Binary), some with high cardinality. - Missing imputation can be a part of your pipeline as well.

# Features #
- enrollee_id : Unique ID for candidate

city: City code

city_ development _index : Developement index of the city (scaled)

gender: Gender of candidate

relevent_experience: Relevant experience of candidate

enrolled_university: Type of University course enrolled if any

education_level: Education level of candidate

major_discipline :Education major discipline of candidate

experience: Candidate total experience in years

company_size: No of employees in current employer's company

company_type : Type of current employer

last_new_job: Difference in years between previous job and current job

training_hours: training hours completed

target: 0 – Not looking for job change, 1 – Looking for a job change

Inspiration

Predict the probability of a candidate will work for the company

Interpret model(s) such a way that illustrate which features affect candidate decision # Please refer to the following task for more details: https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists/tasks?taskId=3015

--- Original source retains full ownership of the source dataset ---
h
Kaggle-LLM-Science-Exam
huggingface.co
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sangeetha Venkatesan (2023). Kaggle-LLM-Science-Exam [Dataset]. https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2023
Authors
Sangeetha Venkatesan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for [LLM Science Exam Kaggle Competition]

Dataset Summary

https://www.kaggle.com/competitions/kaggle-llm-science-exam/data

Languages

[en, de, tl, it, es, fr, pt, id, pl, ro, so, ca, da, sw, hu, no, nl, et, af, hr, lv, sl]

Dataset Structure

Columns prompt - the text of the question being asked A - option A; if this option is correct, then answer will be A B - option B; if this option is correct, then answer will be B C - option C; if this… See the full description on the dataset page: https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam.
Data Science Jobs Salaries Dataset
kaggle.com
Updated Apr 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wahaj Raza (2023). Data Science Jobs Salaries Dataset [Dataset]. https://www.kaggle.com/datasets/swahajraza/data-science-jobs-salaries-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Wahaj Raza
Description
This dataset contains information on salaries for data science jobs in Karachi, Pakistan. This dataset can be used to gain insights into the salaries offered for data science jobs in Karachi and can be helpful for professionals who are looking to explore career opportunities in this field.
A
‘Time Series starter dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Time Series starter dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-time-series-starter-dataset-19e9/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Time Series starter dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/podsyp/time-series-starter-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Machine learning can be applied to time series datasets.

Content

A problem when getting started in time series forecasting with machine learning is finding good quality standard datasets on which to practice.

Acknowledgements

Almost every data scientist will encounter time series in their work and being able to effectively deal with such data is an important skill in the data science toolbox.Almost every data scientist will encounter time series in their work and being able to effectively deal with such data is an important skill in the data science toolbox.

Inspiration

Let’s begin from basics.

--- Original source retains full ownership of the source dataset ---
Data Science Jobs Analysis
kaggle.com
Updated Feb 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niyal Thakkar (2023). Data Science Jobs Analysis [Dataset]. https://www.kaggle.com/datasets/niyalthakkar/data-science-jobs-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Niyal Thakkar
Description
Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions. Data science uses complex machine learning algorithms to build predictive models.

The data used for analysis can come from many different sources and be presented in various formats. Data science is an essential part of many industries today, given the massive amounts of data that are produced, and is one of the most debated topics in IT circles.
h
kaggle-notebooks-edu-v0
huggingface.co
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Agents (2025). kaggle-notebooks-edu-v0 [Dataset]. https://huggingface.co/datasets/data-agents/kaggle-notebooks-edu-v0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2025
Dataset authored and provided by
Data Agents
Description
Kaggle Notebooks LLM Filtered

Model: meta-llama/Meta-Llama-3.1-70B-Instruct Sample: 12,400 Source dataset: data-agents/kaggle-notebooks Prompt:

Below is an extract from a Jupyter notebook. Evaluate whether it has a high analysis value and could help a data scientist.

The notebooks are formatted with the following tokens:

START

Here comes markdown content

Here comes python code

Here comes code output

More… See the full description on the dataset page: https://huggingface.co/datasets/data-agents/kaggle-notebooks-edu-v0.
o
Amazon Data Science Book Reviews
opendatabay.com
kaggle.com
.undefined
Updated Jun 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Amazon Data Science Book Reviews [Dataset]. https://www.opendatabay.com/data/ai-ml/fa468f38-c13a-4388-9e15-6e7acdc99d98
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 16, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
Content This dataset contains 20647 amazon reviews for 836 data-science related books. Every review consists of review text and score (number of stars from 1 to 5).

Acknowledgements Thanks to all the people who write reviews.

License

CC0

Original Data Source: Amazon Data Science Book Reviews
A
‘HR data, Predict changing jobs (competition form)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘HR data, Predict changing jobs (competition form)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hr-data-predict-changing-jobs-competition-form-1d9b/a230c863/?iid=013-955&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘HR data, Predict changing jobs (competition form)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kukuroo3/hr-data-predict-change-jobscompetition-form on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context This dataset was taken from link and separated into competition format. The label for the test data is provided in the form of a function.

--- Original source retains full ownership of the source dataset ---
World Countries and Continents Details
kaggle.com
zip
Updated Oct 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
folaraz (2017). World Countries and Continents Details [Dataset]. https://www.kaggle.com/folaraz/world-countries-and-continents-details
Explore at:
zip(24400 bytes)Available download formats
Dataset updated
Oct 5, 2017
Authors
folaraz
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
World
Description
Context

Can you tell geographical stories about the world using data science?

Content

World countries with their corresponding continents , official english names, official french names, Dial,ITU,Languages and so on.

Acknowledgements

This data was gotten from https://old.datahub.io/

Inspiration

Exploration of the world countries: - Can we graphically visualize countries that speak a particular language? - We can also integrate this dataset into others to enhance our exploration. - The dataset has now been updated to include longitude and latitudes of countries in the world.
GlassDoor(Data Scientist)
kaggle.com
zip
Updated Aug 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
milan (2020). GlassDoor(Data Scientist) [Dataset]. https://www.kaggle.com/milan400/glassdoordata-scientist
Explore at:
zip(1040514 bytes)Available download formats
Dataset updated
Aug 1, 2020
Authors
milan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Data gained from GlassDoor

Content

It contains information regarding the Data Science/ML/DL put by company in GlassDoor

Acknowledgements

Data gained from Glassdoor

Inspiration

Analyze data
A
‘Heart Attack Analysis & Prediction Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Heart Attack Analysis & Prediction Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-heart-attack-analysis-prediction-dataset-51b9/de5fe27e/?iid=015-932&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Heart Attack Analysis & Prediction Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Hone your analytical and ML skills by participating in tasks of my other dataset's. Given below.

Data Science Job Posting on Glassdoor

Groceries dataset for Market Basket Analysis(MBA)

Dataset for Facial recognition using ML approach

Covid_w/wo_Pneumonia Chest Xray

Disney Movies 1937-2016 Gross Income

Bollywood Movie data from 2000 to 2019

17.7K English song data from 2008-2017

About this dataset

Age : Age of the patient

Sex : Sex of the patient

exang: exercise induced angina (1 = yes; 0 = no)

ca: number of major vessels (0-3)

cp : Chest Pain type chest pain type

Value 1: typical angina

Value 2: atypical angina

Value 3: non-anginal pain

Value 4: asymptomatic

trtbps : resting blood pressure (in mm Hg)

chol : cholestoral in mg/dl fetched via BMI sensor

fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

rest_ecg : resting electrocardiographic results

Value 0: normal

Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

thalach : maximum heart rate achieved

target : 0= less chance of heart attack 1= more chance of heart attack

n

--- Original source retains full ownership of the source dataset ---
Data Science Cheat Sheets
kaggle.com
zip
Updated Feb 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Bozsolik (2020). Data Science Cheat Sheets [Dataset]. https://www.kaggle.com/timoboz/data-science-cheat-sheets
Explore at:
zip(625256639 bytes)Available download formats
Dataset updated
Feb 4, 2020
Authors
Timo Bozsolik
Description
A collection of cheat sheets for various data-science related languages and topics.

Taken from https://github.com/abhat222/Data-Science--Cheat-Sheet
Data Science Salaries 2025 💸
kaggle.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
randomarnab (2025). Data Science Salaries 2025 💸 [Dataset]. https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2025
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
randomarnab
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Data Science Job Salaries Dataset contains 11 columns, each are:

work_year: The year the salary was paid. experience_level: The experience level in the job during the year employment_type: The type of employment for the role job_title: The role worked in during the year. salary: The total gross salary amount paid. salary_currency: The currency of the salary paid as an ISO 4217 currency code. salaryinusd: The salary in USD employee_residence: Employee's primary country of residence in during the work year as an ISO 3166 country code. remote_ratio: The overall amount of work done remotely company_location: The country of the employer's main office or contracting branch company_size: The median number of people that worked for the company during the year
30 Short Tips for Your Data Scientist Interview
kaggle.com
Updated Oct 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skillslash17 (2023). 30 Short Tips for Your Data Scientist Interview [Dataset]. https://www.kaggle.com/datasets/skillslash17/30-short-tips-for-your-data-scientist-interview
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Skillslash17
Description
If you’re a data scientist looking to get ahead in the ever-changing world of data science, you know that job interviews are a crucial part of your career. But getting a job as a data scientist is not just about being tech-savvy, it’s also about having the right skillset, being able to solve problems, and having good communication skills. With competition heating up, it’s important to stand out and make a good impression on potential employers.

Data Science has become an essential part of the contemporary business environment, enabling decision-making in a variety of industries. Consequently, organizations are increasingly looking for individuals who can utilize the power of data to generate new ideas and expand their operations. However these roles come with a high level of expectation, requiring applicants to possess a comprehensive knowledge of data analytics and machine learning, as well as the capacity to turn their discoveries into practical solutions.

With so many job seekers out there, it’s super important to be prepared and confident for your interview as a data scientist.

Here are 30 tips to help you get the most out of your interview and land the job you want. No matter if you’re just starting out or have been in the field for a while, these tips will help you make the most of your interview and set you up for success.

Technical Preparation

Qualifying for a job as a data scientist needs a comprehensive level of technical preparation. Job seekers are often required to demonstrate their technical skills in order to show their ability to effectively fulfill the duties of the role. Here are a selection of key tips for technical proficiency:

1 Master the Basics

Make sure you have a good understanding of statistics, math, and programming languages such as Python and R.

2 Understand Machine Learning

Gain an in-depth understanding of commonly used machine learning techniques, including linear regression and decision trees, as well as neural networks.

3 Data Manipulation

Make sure you're good with data tools like Pandas and Matplotlib, as well as data visualization tools like Seaborn.

4 SQL Skills

Gain proficiency in the use of SQL language to extract and process data from databases.

5 Feature Engineering

Understand and know the importance of feature engineering and how to create meaningful features from raw data.

6 Model Evaluation

Learn to assess and compare machine learning models using metrics like accuracy, precision, recall, and F1-score.

7 Big Data Technologies

If the job requires it, become familiar with big data technologies like Hadoop and Spark.

8 Coding Challenges

Practice coding challenges related to data manipulation and machine learning on platforms like LeetCode and Kaggle.

Portfolio and Projects

9 Build a Portfolio

Develop a portfolio of your data science projects that outlines your methodology, the resources you have employed, and the results achieved.

10 Kaggle Competitions

Participate in Kaggle competitions to gain real-world experience and showcase your problem-solving skills.

11 Open Source Contributions

Contribute to open-source data science projects to demonstrate your collaboration and coding abilities.

12 GitHub Profile

Maintain a well-organized GitHub profile with clean code and clear project documentation.

Domain Knowledge

13 Understand the Industry

Research the industry you’re applying to and understand its specific data challenges and opportunities.

14 Company Research

Study the company you’re interviewing with to tailor your responses and show your genuine interest.

Soft Skills

15 Communication

Practice explaining complex concepts in simple terms. Data Scientists often need to communicate findings to non-technical stakeholders.

16 Problem-Solving

Focus on your problem-solving abilities and how you approach complex challenges.

17 Adaptability

Highlight your ability to adapt to new technologies and techniques as the field of data science evolves.

Interview Etiquette

18 Professional Appearance

Dress and present yourself in a professional manner, whether the interview is in person or remote.

19 Punctuality

Be on time for the interview, whether it’s virtual or in person.

20 Body Language

Maintain good posture and eye contact during the interview. Smile and exhibit confidence.

21 Active Listening

Pay close attention to the interviewer's questions and answer them directly.

Behavioral Questions

22 STAR Method

Use the STAR (Situation, Task, Action, Result) method to structure your responses to behavioral questions.

23 Conflict Resolution

Be prepared to discuss how you have handled conflicts or challenging situations in previous roles.

24 Teamwork

Highlight instances where you’ve worked effectively in cross-functional teams...

Facebook

Twitter

Click to copy link

Link copied

Cite

ian cecil akoto (2024). Gemma-Data Science Agent- Instruct- Dataset [Dataset]. https://www.kaggle.com/datasets/ianakoto/gemma-data-science-agent-instruct-dataset

Gemma-Data Science Agent- Instruct- Dataset

Data Science Assistance with Gemma Fine-tuned on Kaggle Solutions Writeup

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 2, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

ian cecil akoto

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Overview This dataset contains question-answer pairs with context extracted from Kaggle solution write-ups and discussion forums. The dataset was created to facilitate fine-tuning Gemma, an AI model, for data scientist assistant tasks such as question answering and providing data science assistance.

Dataset Details Columns: Question: The question generated based on the context extracted from Kaggle solution write-ups and discussion forums. Answer: The corresponding answer to the generated question. Context: The context extracted from Kaggle solution write-ups and discussion forums, which serves as the basis for generating questions and answers. Subtitle: Subtitle or additional information related to the Kaggle competition or topic. Title: Title of the Kaggle competition or topic. Sources and Inspiration

Sources:

Meta Kaggle: The dataset was sourced from Meta Kaggle, an official Kaggle platform where users discuss competitions, kernels, datasets, and more. Kaggle Solution Write-ups: Solution write-ups submitted by Kaggle users were utilized as a primary source of context for generating questions and answers. Discussion Forums: Discussion threads on Kaggle forums were used to gather additional insights and context for the dataset. Inspiration:

The dataset was inspired by the need for a specialized dataset tailored for fine-tuning Gemma, an AI model designed for data scientist assistant tasks. The goal was to create a dataset that captures the essence of real-world data science problems discussed on Kaggle, enabling Gemma to provide accurate and relevant assistance to data scientists and Kaggle users. Dataset Specifics Total Records: [Specify the total number of question-answer pairs in the dataset] Format: CSV (Comma Separated Values) Size: [Specify the size of the dataset in MB or GB] License: [Specify the license under which the dataset is distributed, e.g., CC BY-SA 4.0] Download Link: [Provide a link to download the dataset] Acknowledgments We acknowledge Kaggle and its community for providing valuable data science resources and discussions that contributed to the creation of this dataset. We appreciate the efforts of Gemma and Langchain in fine-tuning AI models for data scientist assistant tasks, enabling enhanced productivity and efficiency in the field of data science.

Clear search

Close search

Google apps

Main menu

Gemma-Data Science Agent- Instruct- Dataset

The AI, ML, Data Science Salary (2020- 2025)

DSEval-Kaggle Dataset

‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2

From wiki

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Analyzing Data Science Salaries

Dataset

Contents

‘HR Analytics: Job Change of Data Scientists’ analyzed by Analyst-2

Context and Content

Inspiration

Kaggle-LLM-Science-Exam

Data Science Jobs Salaries Dataset

‘Time Series starter dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Data Science Jobs Analysis

kaggle-notebooks-edu-v0

Here comes markdown content

Here comes python code

Here comes code output

More… See the full description on the dataset page: https://huggingface.co/datasets/data-agents/kaggle-notebooks-edu-v0.

Amazon Data Science Book Reviews

License

‘HR data, Predict changing jobs (competition form)’ analyzed by Analyst-2

World Countries and Continents Details

Context

Content

Acknowledgements

Inspiration

GlassDoor(Data Scientist)

Context

Content

Acknowledgements

Inspiration

‘Heart Attack Analysis & Prediction Dataset’ analyzed by Analyst-2

Hone your analytical and ML skills by participating in tasks of my other dataset's. Given below.

About this dataset

Data Science Cheat Sheets

Data Science Salaries 2025 💸

30 Short Tips for Your Data Scientist Interview

1 Master the Basics

2 Understand Machine Learning

3 Data Manipulation

4 SQL Skills

5 Feature Engineering

6 Model Evaluation

7 Big Data Technologies

8 Coding Challenges

9 Build a Portfolio

10 Kaggle Competitions

11 Open Source Contributions

12 GitHub Profile

13 Understand the Industry

14 Company Research

15 Communication

16 Problem-Solving

17 Adaptability

18 Professional Appearance

19 Punctuality

20 Body Language

21 Active Listening

22 STAR Method

23 Conflict Resolution

24 Teamwork

Gemma-Data Science Agent- Instruct- Dataset

Data Science Assistance with Gemma Fine-tuned on Kaggle Solutions Writeup