14 datasets found
  1. Diabetes.csv and arff

    • kaggle.com
    zip
    Updated Aug 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amrikkatoch308 (2021). Diabetes.csv and arff [Dataset]. https://www.kaggle.com/amrikkatoch308/diabetescsv-and-arff
    Explore at:
    zip(22933 bytes)Available download formats
    Dataset updated
    Aug 1, 2021
    Authors
    amrikkatoch308
    Description

    Dataset

    This dataset was created by amrikkatoch308

    Contents

  2. US census 1990

    • kaggle.com
    zip
    Updated Jul 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toby Anderson (2021). US census 1990 [Dataset]. https://www.kaggle.com/tobyanderson/us-census-1990
    Explore at:
    zip(404935 bytes)Available download formats
    Dataset updated
    Jul 30, 2021
    Authors
    Toby Anderson
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    1990 US Census Data

    Abstract The USCensus1990 data set is a discretized version of the USCensus1990raw data set. Many of the less useful attributes in the original data set have been dropped, the few continuous variables have been discretized and the few discrete variables that have a large number of possible values have been collapsed to have fewer possible values.

    Sources The USCensus1990raw data set was obtained from the (U.S. Department of Commerce) Census Bureau website using the Data Extraction System. This system can be found at http://www.census.gov/DES/www/d es.html.

    Donor of database Chris Meek Bo Thiesson David Heckerman

    Data Characteristics The data was collected as part of the 1990 census.

    There are 68 categorical attributes. This data set was derived from the USCensus1990raw data set. The attributes are listed in the file USCensus1990.attributes.txt (repeated below) and the coding for the values is described below. Many of the less useful attributes in the original data set have been dropped, the few continuous variables have been discretized and the few discrete variables that have a large number of possible values have been collapsed to have fewer possible values.

    More specifically the USCensus1990 data set was obtained from the USCensus1990raw data set by the following sequence of operations;

    Randomization: The order of the cases in the original USCensus1990raw data set were randomly permuted. Selection of attributes: The 68 attributes included in the data set are given below. In the USCensus1990 data set we have added a single letter prefix to the original name. We add the letter 'i' to indicate that the original attribute values are used and 'd' to indicate that original attribute values for each case have been mapped to new values (the precise mapping is described below).

    Other Relevant Information Hierarchies of values are provided in the file USCensus1990raw.coding.htm and the mapping functions used to transform the USCensus1990raw to the USCensus1990 data sets are giving in the file USCensus1990.mapping.sql.

    Data Format The data is contained in a file called USCensus1990.data.txt. The first row contains the list of attributes. The first attribute is a caseid and should be ignored during analysis. The data is comma delimited with one case per row.

    References & Further Information The U.S. Department of Commerce Bureau of Census website Data Extraction System Meek, Thiesson, and Heckerman (2001), "The Learning Curve Method Applied to Clustering", to appear in The Journal of Machine Learning Research. MSR-TR-2001-34 The UCI KDD Archive Information and Computer Science University of California, Irvine Irvine, CA 92697-3425 Last modified: 6 Nov 2001

  3. Data from: Automatic composition of descriptive music: A case study of the...

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucía Martín-Gómez (2023). Automatic composition of descriptive music: A case study of the relationship between image and sound [Dataset]. http://doi.org/10.6084/m9.figshare.6682998.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lucía Martín-Gómez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    FANTASIAThis repository contains the data related to image descriptors and sound associated with a selection of frames of the films Fantasia and Fantasia 2000 produced by DisneyAboutThis repository contains the data used in the article Automatic composition of descriptive music: A case study of the relationship between image and sound published in the 6th International Workshop on Computational Creativity, Concept Invention, and General Intelligence (C3GI). Data structure is explained in detail in the article. AbstractHuman beings establish relationships with the environment mainly through sight and hearing. This work focuses on the concept of descriptive music, which makes use of sound resources to narrate a story. The Fantasia film, produced by Walt Disney was used in the case study. One of its musical pieces is analyzed in order to obtain the relationship between image and music. This connection is subsequently used to create a descriptive musical composition from a new video. Naive Bayes, Support Vector Machine and Random Forest are the three classifiers studied for the model induction process. After an analysis of their performance, it was concluded that Random Forest provided the best solution; the produced musical composition had a considerably high descriptive quality. DataNutcracker_data.arff: Image descriptors and the most important sound of each frame from the fragment "The Nutcracker Suite" in film Fantasia. Data stored into ARFF format.Firebird_data.arff: Image descriptors of each frame from the fragment "The Firebird" in film Fantasia 2000. Data stored into ARFF format.Firebird_midi_prediction.csv: Frame number of the fragment "The Firebird" in film Fantasia 2000 and the sound predicted by the system encoded in MIDI. Data stored into CSV format.Firebird_prediction.mp3: Audio file with the synthesizing of the prediction data for the fragment "The Firebird" of film Fantasia 2000.LicenseData is available under MIT License. To make use of the data the article must be cited.

  4. Z

    Data from: Machine Learning Models and New Computational Tool for the...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pulgar-Sánchez; Marrero-Ponce; Hernández-Lambraño; Garcia-Jacas; Martinez-Rios (2022). Machine Learning Models and New Computational Tool for the Discovery of Insect Repellents that Interfere with Olfaction [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6677764
    Explore at:
    Dataset updated
    Jun 22, 2022
    Dataset provided by
    Cátedras Conacyt – Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, México
    Escuela de Ciencias Biológicas e Ingeniería, Universidad Yachay Tech, Hacienda San José, Proyecto Yachay, Urcuquí, Ecuador
    Universidad de Salamanca, Facultad de Farmacia, Departamento de Botánica, 4th Piso, Avenida Licenciado Méndez Nieto s/n, 37007 Salamanca, España
    Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Av. Interoceánica Km 12 ½—Cumbayá, Quito 170157, Ecuador & Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador.
    Universidad Panamericana, Facultad de Ingeniería, Ciudad de México, México.
    Authors
    Pulgar-Sánchez; Marrero-Ponce; Hernández-Lambraño; Garcia-Jacas; Martinez-Rios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SI1_Supporting Information file (docx) brings together detailed information on the outstanding models obtained for each dataset analyzed in this study such as statistical and training parameters and outliers. There can be found the responses in spikes/s of the mosquito Culex quinquefasciatus to the 50 IRs. Besides, there is presented a full table of the up-to-date studies related to QSAR and insect repellency.

    SI2_EXP1_50IRs from Liu et al (2013) SDF file presents the structures of each of the 50 IRs analyzed.

    SI3_EXP2_Datasets gathers the four datasets as SDF files from Oliferenko et al. (2013), Gaudin et al. (2008), Omolo et al. (2004), and Paluch et al. (2009) used for the repellency modeling in EXP2.

    SI4_EXP3_Prospective analysis provides Malaria Box Library (400 compounds) as an SDF file, which were analyzed in our virtual screening to prospect potential virtual hits.

    SI5_QuBiLS-MIDAS MDs lists contain three TXT lists of 3D molecular descriptors used in QuBiLS-MIDAS to describe the molecules used in the present study.

    SI6_EXP1_Sensillar Modeling comprises two subfolders: Classification and Regression models for each of the six sensilla. Models built to predict the physiological interaction experimentally obtained from Liu et al. (2013). All of the models are implemented in the software SiLiS-PAPACS. Every single folder compiles a DOCX file with the detailed description of the model, an XLSX file with the output obtained from the training in Weka 3.9.4, an ARFF, and CSV files with the MDs for each molecule, and the SDF of the study dataset.

    SI7_EXP2_Repellency Modeling encompasses the four datasets in the study: Oliferenko et al. (2013), Gaudin et al. (2008), Omolo et al. (2004), and Paluch et al. (2009). Inside the subfolders, there are three models per type of MDs (duplex, triple, generic, and mix) selected that best predict each dataset. As well as the SI6 folder, each model includes six files: DOCX, XLSX, ARFF, CSV, and an SDF.

    SI8_Virtual Hits includes the cluster analysis results and physico-chemical properties of new IR virtual leads.

  5. ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • zenodo.org
    • elki-project.github.io
    • +1more
    application/gzip
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erich Schubert; Erich Schubert; Arthur Zimek; Arthur Zimek (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. http://doi.org/10.5281/zenodo.6355684
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erich Schubert; Erich Schubert; Arthur Zimek; Arthur Zimek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2022
    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek
    Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
    In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek
    Evaluation of Multiple Clustering Solutions
    In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
    On Evaluation of Outlier Rankings and Outlier Scores
    In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

    Feature typeDescriptionFiles
    Object numberSparse 1000 dimensional vectors that give the true object assignmentobjs.arff.gz
    RGB color histogramsStandard RGB color histograms (uniform binning)aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    HSV color histogramsStandard HSV/HSB color histograms in various binningsaloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    Color similiarityAverage similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    Haralick featuresFirst 13 Haralick features (radius 1 pixel)aloi-haralick-1.csv.gz
    Front to backVectors representing front face vs. back faces of individual objectsfront.arff.gz
    Basic lightVectors indicating basic light situationslight.arff.gz
    Manual annotationsManually annotated object groups of semantically related objects such as cupsmanual1.arff.gz

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

    Feature typeDescriptionFiles
    RGB HistogramsDownsampled to 100000 objects (553 outliers)aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    Downsampled to 75000 objects (717 outliers)aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    Downsampled to 50000 objects (1508 outliers)aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
  6. i

    NSL-KDD dataset

    • impactcybertrust.org
    • kaggle.com
    Updated Jan 1, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Data Source (2009). NSL-KDD dataset [Dataset]. http://doi.org/10.23721/100/1478792
    Explore at:
    Dataset updated
    Jan 1, 2009
    Authors
    External Data Source
    Time period covered
    Jan 1, 2009
    Description

    NSL-KDD is a data set suggested to solve some of the inherent problems of the KDD'99 data set . Although, this new version of the KDD data set still suffers from some of the problems discussed by McHugh and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.

    Furthermore, the number of records in the NSL-KDD train and test sets are reasonable. This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research work will be consistent and comparable.

    Data files

    KDDTrain+.ARFF: The full NSL-KDD train set with binary labels in ARFF format
    KDDTrain+.TXT: The full NSL-KDD train set including attack-type labels and difficulty level in CSV format
    KDDTrain+_20Percent.ARFF: A 20% subset of the KDDTrain+.arff file
    KDDTrain+_20Percent.TXT: A 20% subset of the KDDTrain+.txt file
    KDDTest+.ARFF: The full NSL-KDD test set with binary labels in ARFF format
    KDDTest+.TXT: The full NSL-KDD test set including attack-type labels and difficulty level in CSV format
    KDDTest-21.ARFF: A subset of the KDDTest+.arff file which does not include records with difficulty level of 21 out of 21
    KDDTest-21.TXT: A subset of the KDDTest+.txt file which does not include records with difficulty level of 21 out of 21
    ; cic@unb.ca.

  7. h

    NSL-KDD

    • huggingface.co
    Updated Jul 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mireu Lab (2023). NSL-KDD [Dataset]. https://huggingface.co/datasets/Mireu-Lab/NSL-KDD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2023
    Authors
    Mireu Lab
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    NSL-KDD

    The data set is a data set that converts the arff File provided by the link into CSV and results. The data set is personally stored by converting data to float64. If you want to obtain additional original files, they are organized in the Original Directory in the repo.

      Labels
    

    The label of the data set is as follows.

    # Column Non-Null Count Dtype

    0 duration 151165 non-null int64

    1 protocol_type 151165 non-null object

    2 service 151165 non-null… See the full description on the dataset page: https://huggingface.co/datasets/Mireu-Lab/NSL-KDD.

  8. Image and sound data from film Fantasia produced by Walt Disney

    • figshare.com
    mpga
    Updated Mar 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucía Martín-Gómez; Javier Pérez-Marcos (2018). Image and sound data from film Fantasia produced by Walt Disney [Dataset]. http://doi.org/10.6084/m9.figshare.5999207.v3
    Explore at:
    mpgaAvailable download formats
    Dataset updated
    Mar 19, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lucía Martín-Gómez; Javier Pérez-Marcos
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This repository contains the data used in the article Convolutional neural networks and transfer learning applied to automatic composition of descriptive music published in the 15th International Conference on Distributed Computing and Artificial Intelligence (DCAI). Data structure is explained in detail in the article. This proposal is the continuation of an earlier work whose data are available in a GitHub repository.AbstractVisual and musical arts has been strongly interconnected throughout history. The aim of this work is to compose music on the basis of the visual characteristics of a video. For this purpose, descriptive music is used as a link between image and sound and a video fragment of film Fantasia is deeply analyzed. Specially, convolutional neural networks in combination with transfer learning are applied in the process of extracting image descriptors. In order to establish a relationship between the visual and musical information, Naive Bayes, Support Vector Machine and Random Forest classifiers are applied. The obtained model is subsequently employed to compose descriptive music from a new video. The results of this proposal are compared with those of an antecedent work in order to evaluate the performance of the classifiers and the quality of the descriptive musical composition.DATAtrain_data.arff: Image descriptors and the most important sound of each frame from the fragment "The Nutcracker Suite" in film Fantasia obtained by means of CNNs. Data stored into ARFF format.test_data.arff: Image descriptors of each frame from the fragment "The Firebird" in film Fantasia 2000 obtained by means of CNNs. Data stored into ARFF format.midi.csv: Frame number of the fragment "The Firebird" in film Fantasia 2000 and the sound predicted by the system encoded in MIDI. Data stored into CSV format.firebird_prediction.mp3: Audio file with the synthesizing of the prediction data for the fragment "The Firebird" of film Fantasia 2000.LICENSEData is available under MIT License. To make use of the data the article must be cited.

  9. Phishing Dataset UCI ML CSV

    • kaggle.com
    zip
    Updated Sep 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satish Yadav (2020). Phishing Dataset UCI ML CSV [Dataset]. https://www.kaggle.com/datasets/isatish/phishing-dataset-uci-ml-csv
    Explore at:
    zip(112567 bytes)Available download formats
    Dataset updated
    Sep 27, 2020
    Authors
    Satish Yadav
    Description

    Context

    This dataset is taken from UCI Phishing Dataset originally in ARFF format, converted into CSV. This dataset can be used to train and validate Phishing Detection Machine Learning Projects

  10. NSL-KDD

    • kaggle.com
    zip
    Updated Apr 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Hassan Zaib (2019). NSL-KDD [Dataset]. https://www.kaggle.com/datasets/hassan06/nslkdd/code
    Explore at:
    zip(14529600 bytes)Available download formats
    Dataset updated
    Apr 25, 2019
    Authors
    M Hassan Zaib
    Description

    Dataset Information

    KDDTrain+.ARFF The full NSL-KDD train set with binary labels in ARFF format

    KDDTrain+.TXT The full NSL-KDD train set including attack-type labels and difficulty level in CSV format

    The full NSL-KDD train set including attack-type labels and difficulty level in CSV format

    KDDTrain+_20Percent.ARFF A 20% subset of the KDDTrain+.arff file

    KDDTrain+_20Percent.TXT A 20% subset of the KDDTrain+.txt file

    KDDTest+.ARFF The full NSL-KDD test set with binary labels in ARFF format

    KDDTest+.TXT The full NSL-KDD test set including attack-type labels and difficulty level in CSV format

    KDDTest-21.ARFF A subset of the KDDTest+.arff file which does not include records with difficulty level of 21 out of 21

    KDDTest-21.TXT A subset of the KDDTest+.txt file which does not include records with difficulty level of 21 out of 21

    ***Improvements to the KDD'99 data set ***

    The NSL-KDD data set has the following advantages over the original KDD data set: It does not include redundant records in the train set, so the classifiers will not be biased towards more frequent records.

    There is no duplicate records in the proposed test sets; therefore, the performance of the learners are not biased by the methods which have better detection rates on the frequent records.

    The number of selected records from each difficultylevel group is inversely proportional to the percentage of records in the original KDD data set. As a result, the classification rates of distinct machine learning methods vary in a wider range, which makes it more efficient to have an accurate evaluation of different learning techniques.

    The number of records in the train and test sets are reasonable, which makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research works will be consistent and comparable.

  11. credit-g

    • kaggle.com
    zip
    Updated Aug 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tarek yahia (2023). credit-g [Dataset]. https://www.kaggle.com/datasets/tarekyahia/credit-g
    Explore at:
    zip(20672 bytes)Available download formats
    Dataset updated
    Aug 17, 2023
    Authors
    tarek yahia
    Description

    Dataset

    This dataset was created by tarek yahia

    Contents

  12. Musk v2 (Multiple Instance Learning Data)

    • kaggle.com
    zip
    Updated Jan 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    arturo-bandini-jr (2024). Musk v2 (Multiple Instance Learning Data) [Dataset]. https://www.kaggle.com/datasets/banddaniel/musk-v2-multiple-instance-learning-data
    Explore at:
    zip(32117111 bytes)Available download formats
    Dataset updated
    Jan 28, 2024
    Authors
    arturo-bandini-jr
    Description

    Musk2 is a molecule target activity dataset for binary classification and multiple instance learning benchmark.

    • Data is divided into 10-fold training and testing datasets. You can use only one train and test files.
    • The dataset contains 166 features (dubbed f1, f2 ...).
    • Target is 0 (non-activity) or 1 (activity)

    original source -> https://www.uco.es/grupos/kdis/momil/musk2.html

    My script for converting .arff file to .csv file is below.

    from scipy.io.arff import loadarff
    import numpy as np
    import pandas as pd
    
    def arff_to_pd(path):
     data = loadarff(path)
     raw_data, meta_data = data
    
     count = 0
     for i in range(raw_data.shape[0]):
      count += len(raw_data[i][1])
    
     cols = []
     for i in range(1,len(raw_data[0][1][0])+1):
      cols.append("f{0}".format(i)) 
    
     mol_names = []
     labels = []
     data2d = np.zeros([count, len(cols)])
     m = 0
    
     for i in range(raw_data.shape[0]):
      for j in range(len(raw_data[i][1])):
       mol_names.append(raw_data[i][0].decode())
       labels.append(int(raw_data[i][2].decode()))
    
       row_data = []
    
       for col_name in cols:
         row_data.append(raw_data[i][1][j][col_name])
    
       data2d[m] = row_data
       m += 1
    
     df = pd.DataFrame(data2d, columns = cols)
     df2 = df.assign(molecul_name = mol_names)
     df2_cols = df2.columns.tolist()
     df2_cols = df2_cols[-1:] + df2_cols[:-1]
     df2 = df2[df2_cols]
     df2 = df2.assign(label = labels)
    
    
     return df2
    
  13. h

    巴西一家快递公司2007年7月至2010年7月的缺勤记录 - Dataset - 海数据

    • haidatas.com
    Updated Jul 15, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2007). 巴西一家快递公司2007年7月至2010年7月的缺勤记录 - Dataset - 海数据 [Dataset]. https://haidatas.com/dataset/baxiyijiakuaidigongsi2007nian7yuezhi2010ni_c52716b4
    Explore at:
    Dataset updated
    Jul 15, 2007
    Description

    数据集名称:巴西一家快递公司2007年7月至2010年7月的缺勤记录 数据数量:5 数据集关键词:2007, 2010 数据集格式:docx, csv, xls, arff

  14. Cleaned Amazon Commerce Reviews Dataset

    • kaggle.com
    zip
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nurhayat Yılmaz (2025). Cleaned Amazon Commerce Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/nurhayatylmaz/cleaned-amazon-commerce-reviews-dataset
    Explore at:
    zip(2813012 bytes)Available download formats
    Dataset updated
    Oct 31, 2025
    Authors
    Nurhayat Yılmaz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains customer reviews from the Amazon Commerce dataset. The original data was provided in ARFF format and included mixed or improperly encoded values.

    I cleaned and converted it into a clean CSV file, fixing encoding issues (UTF-8), removing extra symbols, and organizing columns for better readability and usability.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
amrikkatoch308 (2021). Diabetes.csv and arff [Dataset]. https://www.kaggle.com/amrikkatoch308/diabetescsv-and-arff
Organization logo

Diabetes.csv and arff

Explore at:
zip(22933 bytes)Available download formats
Dataset updated
Aug 1, 2021
Authors
amrikkatoch308
Description

Dataset

This dataset was created by amrikkatoch308

Contents

Search
Clear search
Close search
Google apps
Main menu