5 datasets found

Hard Drive Failure Prediction ST4000DM000
kaggle.com
Updated May 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awant (2019). Hard Drive Failure Prediction ST4000DM000 [Dataset]. https://www.kaggle.com/awant08/hard-drive-failure-prediction-st4000dm000/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Awant
Description
Context

The dataset contains S.M.A.R.T. attributes of hard drives from 2015 to 2018 on ST4000DM000 model from BackBlaze DC. The dataset was kindly preprocessed and ready to use.

Content

The dataset includes hard drive S.M.A.R.T. attributes along with model, serial number, date and capacity. The dataset was greatly preprocessed.

First of all, the specific model was chosen due to the greatest number of falls. Also, because of too many health drives and a small amount of failured, all failured and only 10k health drives was taken from every year.

Data was processed according to the following rules:

For failured drives was taken 120 days before failure.

For health drives was taken random slice of 120 days in a year.

You can find more details here: https://github.com/awant/sd_failure_predictions

Acknowledgements

The original BackBlaze data: https://www.backblaze.com/b2/hard-drive-test-data.html. One can use this dataset in his own use, but he have to cite BackBlaze as the source and doesn't sell data.

Inspiration

Is it possible to find which of hard drives will be broken in the near future?

Is it possible to predict a day when hard drive will be failured?

Is it possible to generalise an approach and predict failures of other models?

In order for solutions to be comparable, I suggest use 2018 year as a test data and other as a train
AI-Generated Computer Build Reviews (indoneisan)
kaggle.com
zip
Updated Aug 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Itsnatt (2024). AI-Generated Computer Build Reviews (indoneisan) [Dataset]. https://www.kaggle.com/datasets/yaemico/ai-generated-computer-build-reviews-indoneisan
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Aug 31, 2024
Authors
Itsnatt
Description
Roast-PC Dataset: AI-Generated PC Build Reviews

Description:

This dataset is sourced from the "Roast-PC by Gemini" website, a platform that provides AI-powered roasting (critical feedback) on custom PC builds. Users input the components of their PC build, including CPU, GPU, motherboard, RAM, PSU, disk, and intended use case. The dataset captures the logs of these submissions, along with the roasting comments generated by Gemini AI, Google's AI model.

Dataset Overview:

Number of Columns: 9

Number of Rows: 1285

Column Names and Descriptions:

Time: Date and Time of request.

cpu: The CPU model specified by the user (e.g., "AMD Ryzen 5 5500", "Intel i7 1200K").

gpu: The GPU model specified by the user (e.g., "NVIDIA RTX 3080", "AMD Radeon RX 6800").

motherboard: The motherboard model specified by the user (e.g., "ASUS ROG Strix B550-F", "MSI B450 TOMAHAWK").

ram: The RAM configuration specified by the user, including size and speed (e.g., "16GB DDR4 3200MHz").

psu: The PSU (Power Supply Unit) model specified by the user, including wattage (e.g., "Corsair RM750x 750W").

disk: The storage devices specified by the user, including type and capacity (e.g., "1TB NVMe SSD", "500GB SATA HDD").

use_case: The intended use of the PC as specified by the user (e.g., "gaming", "video editing", "general use").

roast_comments: The AI-generated feedback or roasting comments provided by Gemini AI, critiquing the PC build based on the components and use case (indonesian).

Functionality:

This dataset serves multiple purposes:

Component Analysis: Allows for analysis of popular PC component choices and configurations.

AI Feedback Insights: Provides insights into how AI evaluates and critiques different PC builds.

Data Mining: Can be used for exploring trends in PC building preferences, identifying common mistakes, and understanding user behavior in custom PC setups.

Machine Learning Applications: Useful for training models in natural language processing (NLP), particularly in generating or understanding feedback for hardware configurations.

This dataset is ideal for those interested in PC building, hardware analysis, AI-generated content, or anyone curious about trends in custom PC configurations.
A
‘VertebralColumnDataSet’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘VertebralColumnDataSet’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-vertebralcolumndataset-2c81/c5652518/?iid=002-762&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘VertebralColumnDataSet’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/caesarlupum/vertebralcolumndataset on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Vertebral Column Data Set

Download: Data Folder-http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

Data Set Description, http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

Abstract: Data set containing values for six biomechanical features used to classify orthopaedic patients into 3 classes (normal, disk hernia or spondilolysthesis) or 2 classes (normal or abnormal).

Data Set Characteristics: Multivariate

Attribute Characteristics: Real

Associated Tasks: Classification

Number of Instances: 310

Number of Attributes: 6

Missing Values? N/A

Area: N/A

Date Donated: 2011-08-09

Source:

Guilherme de Alencar Barreto (guilherme '@' deti.ufc.br) & Ajalmar RÃªgo da Rocha Neto (ajalmar '@' ifce.edu.br), Department of Teleinformatics Engineering, Federal University of CearÃ¡, Fortaleza, Ceará¡, Brazil.

Henrique Antonio Fonseca da Mota Filho (hdamota '@' gmail.com), Hospital Monte Klinikum, Fortaleza, Ceará¡, Brazil.

Data Set Information:

Biomedical data set built by Dr. Henrique da Mota during a medical residence period in the Group of Applied Research in Orthopaedics (GARO) of the Centre MÃ©dico-Chirurgical de RÃ©adaptation des Massues, Lyon, France. The data have been organized in two different but related classification tasks. The first task consists in classifying patients as belonging to one out of three categories: Normal (100 patients), Disk Hernia (60 patients) or Spondylolisthesis (150 patients). For the second task, the categories Disk Hernia and Spondylolisthesis were merged into a single category labelled as 'abnormal'. Thus, the second task consists in classifying patients as belonging to one out of two categories: Normal (100 patients) or Abnormal (210 patients). We provide files also for use within the WEKA environment.

Attribute Information:

Each patient is represented in the data set by six biomechanical attributes derived from the shape and orientation of the pelvis and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and Abnormal (AB).

Relevant Papers:

(1) Berthonnaud, E., Dimnet, J., Roussouly, P. & Labelle, H. (2005). 'Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters', Journal of Spinal Disorders & Techniques, 18(1):40â€“47.

(2) Rocha Neto, A. R. & Barreto, G. A. (2009). 'On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the Vertebral Column: A Comparative Analysis', IEEE Latin America Transactions, 7(4):487-496.

(3) Rocha Neto, A. R., Sousa, R., Barreto, G. A. & Cardoso, J. S. (2011). 'Diagnostic of Pathology on the Vertebral Column with Embedded Reject Optionâ€ , Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA'2011), Gran Canaria, Spain, Lecture Notes on Computer Science, vol. 6669, p. 588-595.

--- Original source retains full ownership of the source dataset ---
Secchi Depth
kaggle.com
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Sharples (2023). Secchi Depth [Dataset]. https://www.kaggle.com/datasets/jacobsharples/secchi-depth
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jacob Sharples
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
A Secchi disk is a circular disk with alternating black and white quadrants that is used to measure the clarity or transparency of water. It is typically lowered into the water using a line, and the depth at which it disappears from view is measured. The depth at which the disk disappears is called the "Secchi depth" and it provides an indication of the water clarity.

Measuring water transparency is important for a few reasons. First, it can indicate the health of aquatic ecosystems. Clear water allows sunlight to penetrate deeper, which is important for photosynthesis by aquatic plants and algae. If the water becomes cloudy or turbid, it can indicate that there is too much sediment or other particles in the water, which can have negative impacts on the ecosystem.

Second, water transparency can also impact water quality for human use. For example, if the water is too turbid, it can make it difficult to treat for drinking water or for use in industrial processes. Additionally, if the water is too cloudy or turbid, it can impact recreational uses such as swimming or fishing.

Overall, measuring water transparency using a Secchi disk can provide important information about the health and quality of aquatic ecosystems, as well as the suitability of water for human use.

This dataset is sampled from 135 locations along the eastern coast of Georgian Bay in Ontario, Canada from 2003-2005. It provides data on the Secchi depth alongside the characteristics of the water.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6247135%2F33db54e2761af367ead97b0aa070190f%2Fsecchi_demonstration.jpeg?generation=1679696033632439&alt=media" alt="">
VertebralColumnDataSet
kaggle.com
Updated Jan 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caesar Lupum (2020). VertebralColumnDataSet [Dataset]. https://www.kaggle.com/caesarlupum/vertebralcolumndataset/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Caesar Lupum
Description
Vertebral Column Data Set

Download: Data Folder-http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

Data Set Description, http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

Abstract: Data set containing values for six biomechanical features used to classify orthopaedic patients into 3 classes (normal, disk hernia or spondilolysthesis) or 2 classes (normal or abnormal).

Data Set Characteristics: Multivariate

Attribute Characteristics: Real

Associated Tasks: Classification

Number of Instances: 310

Number of Attributes: 6

Missing Values? N/A

Area: N/A

Date Donated: 2011-08-09

Source:

Guilherme de Alencar Barreto (guilherme '@' deti.ufc.br) & Ajalmar RÃªgo da Rocha Neto (ajalmar '@' ifce.edu.br), Department of Teleinformatics Engineering, Federal University of CearÃ¡, Fortaleza, Ceará¡, Brazil.

Henrique Antonio Fonseca da Mota Filho (hdamota '@' gmail.com), Hospital Monte Klinikum, Fortaleza, Ceará¡, Brazil.

Data Set Information:

Biomedical data set built by Dr. Henrique da Mota during a medical residence period in the Group of Applied Research in Orthopaedics (GARO) of the Centre MÃ©dico-Chirurgical de RÃ©adaptation des Massues, Lyon, France. The data have been organized in two different but related classification tasks. The first task consists in classifying patients as belonging to one out of three categories: Normal (100 patients), Disk Hernia (60 patients) or Spondylolisthesis (150 patients). For the second task, the categories Disk Hernia and Spondylolisthesis were merged into a single category labelled as 'abnormal'. Thus, the second task consists in classifying patients as belonging to one out of two categories: Normal (100 patients) or Abnormal (210 patients). We provide files also for use within the WEKA environment.

Attribute Information:

Each patient is represented in the data set by six biomechanical attributes derived from the shape and orientation of the pelvis and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and Abnormal (AB).

Relevant Papers:

(1) Berthonnaud, E., Dimnet, J., Roussouly, P. & Labelle, H. (2005). 'Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters', Journal of Spinal Disorders & Techniques, 18(1):40â€“47.

(2) Rocha Neto, A. R. & Barreto, G. A. (2009). 'On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the Vertebral Column: A Comparative Analysis', IEEE Latin America Transactions, 7(4):487-496.

(3) Rocha Neto, A. R., Sousa, R., Barreto, G. A. & Cardoso, J. S. (2011). 'Diagnostic of Pathology on the Vertebral Column with Embedded Reject Optionâ€ , Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA'2011), Gran Canaria, Spain, Lecture Notes on Computer Science, vol. 6669, p. 588-595.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Awant (2019). Hard Drive Failure Prediction ST4000DM000 [Dataset]. https://www.kaggle.com/awant08/hard-drive-failure-prediction-st4000dm000/code

Hard Drive Failure Prediction ST4000DM000

Disk failure detection task

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 20, 2019

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Awant

Description

Context

The dataset contains S.M.A.R.T. attributes of hard drives from 2015 to 2018 on ST4000DM000 model from BackBlaze DC. The dataset was kindly preprocessed and ready to use.

Content

The dataset includes hard drive S.M.A.R.T. attributes along with model, serial number, date and capacity. The dataset was greatly preprocessed.

First of all, the specific model was chosen due to the greatest number of falls. Also, because of too many health drives and a small amount of failured, all failured and only 10k health drives was taken from every year.

Data was processed according to the following rules:

For failured drives was taken 120 days before failure.
For health drives was taken random slice of 120 days in a year.

You can find more details here: https://github.com/awant/sd_failure_predictions

Acknowledgements

The original BackBlaze data: https://www.backblaze.com/b2/hard-drive-test-data.html. One can use this dataset in his own use, but he have to cite BackBlaze as the source and doesn't sell data.

Inspiration

Is it possible to find which of hard drives will be broken in the near future?
Is it possible to predict a day when hard drive will be failured?
Is it possible to generalise an approach and predict failures of other models?

In order for solutions to be comparable, I suggest use 2018 year as a test data and other as a train

Clear search

Close search

Google apps

Main menu

Hard Drive Failure Prediction ST4000DM000

Context

Content

Acknowledgements

Inspiration

AI-Generated Computer Build Reviews (indoneisan)

Roast-PC Dataset: AI-Generated PC Build Reviews

‘VertebralColumnDataSet’ analyzed by Analyst-2

Vertebral Column Data Set

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Secchi Depth

VertebralColumnDataSet

Vertebral Column Data Set

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Hard Drive Failure Prediction ST4000DM000

Disk failure detection task

Context

Content

Acknowledgements

Inspiration