5 datasets found
  1. Hard Drive Failure Prediction ST4000DM000

    • kaggle.com
    Updated May 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Awant (2019). Hard Drive Failure Prediction ST4000DM000 [Dataset]. https://www.kaggle.com/awant08/hard-drive-failure-prediction-st4000dm000/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Awant
    Description

    Context

    The dataset contains S.M.A.R.T. attributes of hard drives from 2015 to 2018 on ST4000DM000 model from BackBlaze DC. The dataset was kindly preprocessed and ready to use.

    Content

    The dataset includes hard drive S.M.A.R.T. attributes along with model, serial number, date and capacity. The dataset was greatly preprocessed.

    First of all, the specific model was chosen due to the greatest number of falls. Also, because of too many health drives and a small amount of failured, all failured and only 10k health drives was taken from every year.

    Data was processed according to the following rules:

    1. For failured drives was taken 120 days before failure.
    2. For health drives was taken random slice of 120 days in a year.

    You can find more details here: https://github.com/awant/sd_failure_predictions

    Acknowledgements

    The original BackBlaze data: https://www.backblaze.com/b2/hard-drive-test-data.html. One can use this dataset in his own use, but he have to cite BackBlaze as the source and doesn't sell data.

    Inspiration

    1. Is it possible to find which of hard drives will be broken in the near future?
    2. Is it possible to predict a day when hard drive will be failured?
    3. Is it possible to generalise an approach and predict failures of other models?

    In order for solutions to be comparable, I suggest use 2018 year as a test data and other as a train

  2. AI-Generated Computer Build Reviews (indoneisan)

    • kaggle.com
    zip
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Itsnatt (2024). AI-Generated Computer Build Reviews (indoneisan) [Dataset]. https://www.kaggle.com/datasets/yaemico/ai-generated-computer-build-reviews-indoneisan
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 31, 2024
    Authors
    Itsnatt
    Description

    Roast-PC Dataset: AI-Generated PC Build Reviews

    Description:

    This dataset is sourced from the "Roast-PC by Gemini" website, a platform that provides AI-powered roasting (critical feedback) on custom PC builds. Users input the components of their PC build, including CPU, GPU, motherboard, RAM, PSU, disk, and intended use case. The dataset captures the logs of these submissions, along with the roasting comments generated by Gemini AI, Google's AI model.

    Dataset Overview:

    • Number of Columns: 9
    • Number of Rows: 1285

    Column Names and Descriptions:

    1. Time: Date and Time of request.
    2. cpu: The CPU model specified by the user (e.g., "AMD Ryzen 5 5500", "Intel i7 1200K").
    3. gpu: The GPU model specified by the user (e.g., "NVIDIA RTX 3080", "AMD Radeon RX 6800").
    4. motherboard: The motherboard model specified by the user (e.g., "ASUS ROG Strix B550-F", "MSI B450 TOMAHAWK").
    5. ram: The RAM configuration specified by the user, including size and speed (e.g., "16GB DDR4 3200MHz").
    6. psu: The PSU (Power Supply Unit) model specified by the user, including wattage (e.g., "Corsair RM750x 750W").
    7. disk: The storage devices specified by the user, including type and capacity (e.g., "1TB NVMe SSD", "500GB SATA HDD").
    8. use_case: The intended use of the PC as specified by the user (e.g., "gaming", "video editing", "general use").
    9. roast_comments: The AI-generated feedback or roasting comments provided by Gemini AI, critiquing the PC build based on the components and use case (indonesian).

    Functionality:

    This dataset serves multiple purposes:

    • Component Analysis: Allows for analysis of popular PC component choices and configurations.
    • AI Feedback Insights: Provides insights into how AI evaluates and critiques different PC builds.
    • Data Mining: Can be used for exploring trends in PC building preferences, identifying common mistakes, and understanding user behavior in custom PC setups.
    • Machine Learning Applications: Useful for training models in natural language processing (NLP), particularly in generating or understanding feedback for hardware configurations.

    This dataset is ideal for those interested in PC building, hardware analysis, AI-generated content, or anyone curious about trends in custom PC configurations.

  3. A

    ‘VertebralColumnDataSet’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘VertebralColumnDataSet’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-vertebralcolumndataset-2c81/c5652518/?iid=002-762&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘VertebralColumnDataSet’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/caesarlupum/vertebralcolumndataset on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Vertebral Column Data Set

    Download: Data Folder-http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

    Data Set Description, http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

    Abstract: Data set containing values for six biomechanical features used to classify orthopaedic patients into 3 classes (normal, disk hernia or spondilolysthesis) or 2 classes (normal or abnormal).

    • Data Set Characteristics: Multivariate
    • Attribute Characteristics: Real
    • Associated Tasks: Classification
    • Number of Instances: 310
    • Number of Attributes: 6
    • Missing Values? N/A
    • Area: N/A
    • Date Donated: 2011-08-09

    Source:

    Guilherme de Alencar Barreto (guilherme '@' deti.ufc.br) & Ajalmar Rêgo da Rocha Neto (ajalmar '@' ifce.edu.br), Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza, Ceará¡, Brazil.

    Henrique Antonio Fonseca da Mota Filho (hdamota '@' gmail.com), Hospital Monte Klinikum, Fortaleza, Ceará¡, Brazil.

    Data Set Information:

    Biomedical data set built by Dr. Henrique da Mota during a medical residence period in the Group of Applied Research in Orthopaedics (GARO) of the Centre Médico-Chirurgical de Réadaptation des Massues, Lyon, France. The data have been organized in two different but related classification tasks. The first task consists in classifying patients as belonging to one out of three categories: Normal (100 patients), Disk Hernia (60 patients) or Spondylolisthesis (150 patients). For the second task, the categories Disk Hernia and Spondylolisthesis were merged into a single category labelled as 'abnormal'. Thus, the second task consists in classifying patients as belonging to one out of two categories: Normal (100 patients) or Abnormal (210 patients). We provide files also for use within the WEKA environment.

    Attribute Information:

    Each patient is represented in the data set by six biomechanical attributes derived from the shape and orientation of the pelvis and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and Abnormal (AB).

    Relevant Papers:

    (1) Berthonnaud, E., Dimnet, J., Roussouly, P. & Labelle, H. (2005). 'Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters', Journal of Spinal Disorders & Techniques, 18(1):40–47.

    (2) Rocha Neto, A. R. & Barreto, G. A. (2009). 'On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the Vertebral Column: A Comparative Analysis', IEEE Latin America Transactions, 7(4):487-496.

    (3) Rocha Neto, A. R., Sousa, R., Barreto, G. A. & Cardoso, J. S. (2011). 'Diagnostic of Pathology on the Vertebral Column with Embedded Reject Option†, Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA'2011), Gran Canaria, Spain, Lecture Notes on Computer Science, vol. 6669, p. 588-595.

    --- Original source retains full ownership of the source dataset ---

  4. Secchi Depth

    • kaggle.com
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Sharples (2023). Secchi Depth [Dataset]. https://www.kaggle.com/datasets/jacobsharples/secchi-depth
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jacob Sharples
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A Secchi disk is a circular disk with alternating black and white quadrants that is used to measure the clarity or transparency of water. It is typically lowered into the water using a line, and the depth at which it disappears from view is measured. The depth at which the disk disappears is called the "Secchi depth" and it provides an indication of the water clarity.

    Measuring water transparency is important for a few reasons. First, it can indicate the health of aquatic ecosystems. Clear water allows sunlight to penetrate deeper, which is important for photosynthesis by aquatic plants and algae. If the water becomes cloudy or turbid, it can indicate that there is too much sediment or other particles in the water, which can have negative impacts on the ecosystem.

    Second, water transparency can also impact water quality for human use. For example, if the water is too turbid, it can make it difficult to treat for drinking water or for use in industrial processes. Additionally, if the water is too cloudy or turbid, it can impact recreational uses such as swimming or fishing.

    Overall, measuring water transparency using a Secchi disk can provide important information about the health and quality of aquatic ecosystems, as well as the suitability of water for human use.

    This dataset is sampled from 135 locations along the eastern coast of Georgian Bay in Ontario, Canada from 2003-2005. It provides data on the Secchi depth alongside the characteristics of the water.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6247135%2F33db54e2761af367ead97b0aa070190f%2Fsecchi_demonstration.jpeg?generation=1679696033632439&alt=media" alt="">

  5. VertebralColumnDataSet

    • kaggle.com
    Updated Jan 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caesar Lupum (2020). VertebralColumnDataSet [Dataset]. https://www.kaggle.com/caesarlupum/vertebralcolumndataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Caesar Lupum
    Description

    Vertebral Column Data Set

    Download: Data Folder-http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

    Data Set Description, http://archive.ics.uci.edu/ml/machine-learning-databases/00212/

    Abstract: Data set containing values for six biomechanical features used to classify orthopaedic patients into 3 classes (normal, disk hernia or spondilolysthesis) or 2 classes (normal or abnormal).

    • Data Set Characteristics: Multivariate
    • Attribute Characteristics: Real
    • Associated Tasks: Classification
    • Number of Instances: 310
    • Number of Attributes: 6
    • Missing Values? N/A
    • Area: N/A
    • Date Donated: 2011-08-09

    Source:

    Guilherme de Alencar Barreto (guilherme '@' deti.ufc.br) & Ajalmar Rêgo da Rocha Neto (ajalmar '@' ifce.edu.br), Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza, Ceará¡, Brazil.

    Henrique Antonio Fonseca da Mota Filho (hdamota '@' gmail.com), Hospital Monte Klinikum, Fortaleza, Ceará¡, Brazil.

    Data Set Information:

    Biomedical data set built by Dr. Henrique da Mota during a medical residence period in the Group of Applied Research in Orthopaedics (GARO) of the Centre Médico-Chirurgical de Réadaptation des Massues, Lyon, France. The data have been organized in two different but related classification tasks. The first task consists in classifying patients as belonging to one out of three categories: Normal (100 patients), Disk Hernia (60 patients) or Spondylolisthesis (150 patients). For the second task, the categories Disk Hernia and Spondylolisthesis were merged into a single category labelled as 'abnormal'. Thus, the second task consists in classifying patients as belonging to one out of two categories: Normal (100 patients) or Abnormal (210 patients). We provide files also for use within the WEKA environment.

    Attribute Information:

    Each patient is represented in the data set by six biomechanical attributes derived from the shape and orientation of the pelvis and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and Abnormal (AB).

    Relevant Papers:

    (1) Berthonnaud, E., Dimnet, J., Roussouly, P. & Labelle, H. (2005). 'Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters', Journal of Spinal Disorders & Techniques, 18(1):40–47.

    (2) Rocha Neto, A. R. & Barreto, G. A. (2009). 'On the Application of Ensembles of Classifiers to the Diagnosis of Pathologies of the Vertebral Column: A Comparative Analysis', IEEE Latin America Transactions, 7(4):487-496.

    (3) Rocha Neto, A. R., Sousa, R., Barreto, G. A. & Cardoso, J. S. (2011). 'Diagnostic of Pathology on the Vertebral Column with Embedded Reject Option†, Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA'2011), Gran Canaria, Spain, Lecture Notes on Computer Science, vol. 6669, p. 588-595.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Awant (2019). Hard Drive Failure Prediction ST4000DM000 [Dataset]. https://www.kaggle.com/awant08/hard-drive-failure-prediction-st4000dm000/code
Organization logo

Hard Drive Failure Prediction ST4000DM000

Disk failure detection task

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Awant
Description

Context

The dataset contains S.M.A.R.T. attributes of hard drives from 2015 to 2018 on ST4000DM000 model from BackBlaze DC. The dataset was kindly preprocessed and ready to use.

Content

The dataset includes hard drive S.M.A.R.T. attributes along with model, serial number, date and capacity. The dataset was greatly preprocessed.

First of all, the specific model was chosen due to the greatest number of falls. Also, because of too many health drives and a small amount of failured, all failured and only 10k health drives was taken from every year.

Data was processed according to the following rules:

  1. For failured drives was taken 120 days before failure.
  2. For health drives was taken random slice of 120 days in a year.

You can find more details here: https://github.com/awant/sd_failure_predictions

Acknowledgements

The original BackBlaze data: https://www.backblaze.com/b2/hard-drive-test-data.html. One can use this dataset in his own use, but he have to cite BackBlaze as the source and doesn't sell data.

Inspiration

  1. Is it possible to find which of hard drives will be broken in the near future?
  2. Is it possible to predict a day when hard drive will be failured?
  3. Is it possible to generalise an approach and predict failures of other models?

In order for solutions to be comparable, I suggest use 2018 year as a test data and other as a train

Search
Clear search
Close search
Google apps
Main menu