Facebook
Twitterhttps://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
Facebook
TwitterThis dataset was created by lkj9487
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This CSV represents a dummy dataset to test the functionality of trusted repository search capabilities and of research data governance practices. The associated dummy dissertation is entitled Financial Econometrics Dummy Dissertation. The dummy file is a 7KB CSV containing 5000 rows of notional demographic tabular data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.
Facebook
TwitterThis dataset was created by Shaik Mohammad Nazeer
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data collected using an Android Pixel 7 with MarcusLion Whitbox2 app. It is a time series that includes latitude, longitude, altitude, as well as sensor data from the accelerometer, phone rotation. This dataset contains to sets of csv files
sensors yolo
both contain headers: {YYYYMMdd_HHmmss}_sensors.csv:
timestamp: Epoch time in milliseconds (1970) at which the event happened. latitude, longitude: rounded to 1.1 meters altitude: in meters distance: distance in meters of last two readings… See the full description on the dataset page: https://huggingface.co/datasets/marcus-lion/whitebox-test-csv-files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The methodology is the core component of any research-related work. The methods used to gain the results are shown in the methodology. Here, the whole research implementation is done using python. There are different steps involved to get the entire research work done which is as follows:
1. Acquire Personality Dataset
The kaggle machine learning dataset is a collection of datasets, data generators which are used by machine learning community for analysis purpose. The personality prediction dataset is acquired from the kaggle website. This dataset was collected (2016-2018) through an interactive on-line personality test. The personality test was constructed from the IPIP. The personality prediction dataset can be downloaded in zip file format just by clicking on the link available. The personality prediction file consists of two subject CSV files (test.csv & train.csv). The test.csv file has 0 missing values, 7 attributes, and final label output. Also, the dataset has multivariate characteristics. Here, data-preprocessing is done for checking inconsistent behaviors or trends.
2. Data preprocessing
After, Data acquisition the next step is to clean and preprocess the data. The Dataset available has numerical type features. The target value is a five-level personality consisting of serious,lively,responsible,dependable & extraverted. The preprocessed dataset is further split into training and testing datasets. This is achieved by passing feature value, target value, test size to the train-test split method of the scikit-learn package. After splitting of data, the training data is sent to the following Logistic regression & SVM design is used for training the artificial neural networks then test data is used to predict the accuracy of the trained network model.
3. Feature Extraction
The following items were presented on one page and each was rated on a five point scale using radio buttons. The order on page was EXT1, AGR1, CSN1, EST1, OPN1, EXT2, etc. The scale was labeled 1=Disagree, 3=Neutral, 5=Agree
EXT1 I am the life of the party.
EXT2 I don't talk a lot.
EXT3 I feel comfortable around people.
EXT4 I am quiet around strangers.
EST1 I get stressed out easily.
EST2 I get irritated easily.
EST3 I worry about things.
EST4 I change my mood a lot.
AGR1 I have a soft heart.
AGR2 I am interested in people.
AGR3 I insult people.
AGR4 I am not really interested in others.
CSN1 I am always prepared.
CSN2 I leave my belongings around.
CSN3 I follow a schedule.
CSN4 I make a mess of things.
OPN1 I have a rich vocabulary.
OPN2 I have difficulty understanding abstract ideas.
OPN3 I do not have a good imagination.
OPN4 I use difficult words.
4. Training the Model
Train/Test is a method to measure the accuracy of your model. It is called Train/Test because you split the the data set into two sets: a training set and a testing set. 80% for training, and 20% for testing. You train the model using the training set.In this model we trained our dataset using linear_model.LogisticRegression() & svm.SVC() from sklearn Package
5. Personality Prediction Output
After the training of the designed neural network, the testing of Logistic Regression & SVM is performed using Cohen_kappa_score & Accuracy Score.
Facebook
TwitterRepo of csv files for testing CoW
Facebook
TwitterThis dataset was created by Alamak_jang
Facebook
TwitterExample of a csv file exported from the database.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The benchmarking datasets used for deepBlink. The npz files contain train/valid/test splits inside and can be used directly. The files belong to the following challenges / classes:- ISBI Particle tracking challenge: microtubule, vesicle, receptor- Custom synthetic (based on http://smal.ws): particle- Custom fixed cell: smfish- Custom live cell: suntagThe csv files are to determine which image in the test splits correspond to which original image, SNR, and density.
Facebook
TwitterImages sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim
Reconnaissance d'images de chiffres manuscrits
Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test
Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv
Reconnaissance d'images de gestes de la langue des signes
Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test
Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist
Reconnaissance de vêtements et chaussures (Zalando)
Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000
Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)
Autres versions avec des images plus petites et/ou en niveaux de gris
Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000
Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10
Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.
Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.
ger_train.csv – The German training set as CSV file.
ger_validation.csv – The German validation set as CSV file.
en_test.csv – The English test set as CSV file.
en_train.csv – The English training set as CSV file.
en_validation.csv – The English validation set as CSV file.
splitting.py – The python code for splitting a dataset into train, test and validation set.
DataSetTrans_de.csv – The final German dataset as a CSV file.
DataSetTrans_en.csv – The final English dataset as a CSV file.
translation.py – The python code for translating the cleaned dataset.
Facebook
TwitterThis dataset is created randomly using numpy random method. The whole point is, however, to provide a dataset for clustering (Logistic Regression, Neural Networks, etc.).
The training dataset is a CSV file that represents 300 score of a language test ("Reading", "Listening", "Speaking", "Writing"). The values are floating point numbers between 0 and 1. Simply, the results are categorized according to the average of the scores from 4 main parts.
The test dataset is a CSV file with 44 scores.
The name of the first user will be written.
I hope this dataset will encourage all newbies to enter the world of machine learning.
Obviously, data is free.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
can-csvThis dataset contains controller area network (CAN) traffic for the 2017 Subaru Forester, the 2016 Chevrolet Silverado, the 2011 Chevrolet Traverse, and the 2011 Chevrolet Impala.For each vehicle, there are samples of attack-free traffic--that is, normal traffic--as well as samples of various types of attacks. The spoofing attacks, such as RPM spoofing, speed spoofing, etc., have an observable effect on the vehicle under test.This repository contains only .csv files. It is a subset of the can-dataset repository.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ransomware is considered as a significant threat for most enterprises since past few years. In scenarios wherein users can access all files on a shared server, one infected host is capable of locking the access to all shared files. In the article related to this repository, we detect ransomware infection based on file-sharing traffic analysis, even in the case of encrypted traffic. We compare three machine learning models and choose the best for validation. We train and test the detection model using more than 70 ransomware binaries from 26 different families and more than 2500 h of ‘not infected’ traffic from real users. The results reveal that the proposed tool can detect all ransomware binaries, including those not used in the training phase (zero-days). This paper provides a validation of the algorithm by studying the false positive rate and the amount of information from user files that the ransomware could encrypt before being detected.
This dataset directory contains the 'infected' and 'not infected' samples and the models used for each T configuration, each one in a separated folder.
The folders are named NxSy where x is the number of 1-second interval per sample and y the sliding step in seconds.
Each folder (for example N10S10/) contains: - tree.py -> Python script with the Tree model. - ensemble.json -> JSON file with the information about the Ensemble model. - NN_XhiddenLayer.json -> JSON file with the information about the NN model with X hidden layers (1, 2 or 3). - N10S10.csv -> All samples used for training each model in this folder. It is in csv format for using in bigML application. - zeroDays.csv -> All zero-day samples used for testing each model in this folder. It is in csv format for using in bigML application. - userSamples_test -> All samples used for validating each model in this folder. It is in csv format for using in bigML application. - userSamples_train -> User samples used for training the models. - ransomware_train -> Ransomware samples used for training the models - scaler.scaler -> Standard Scaler from python library used for scale the samples. - zeroDays_notFiltered -> Folder with the zeroDay samples.
In the case of N30S30 folder, there is an additional folder (SMBv2SMBv3NFS) with the samples extracted from the SMBv2, SMBv3 and NFS traffic traces. There are more binaries than the ones presented in the article, but it is because some of them are not "unseen" binaries (the families are present in the training set).
The files containing samples (NxSy.csv, zeroDays.csv and userSamples_test.csv) are structured as follows: - Each line is one sample. - Each sample has 3*T features and the label (1 if it is 'infected' sample and 0 if it is not). - The features are separated by ',' because it is a csv file. - The last column is the label of the sample.
Additionally we have placed two pcap files in root directory. There are the traces used for compare both versions of SMB.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials
Background
This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.
The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).
Usage
Included Files
File Format: Downsampled Data
These are the "LP_
These data files can be easily loaded using the pandas library in Python through:
import pandas
data = pandas.read_csv(data_file, index_col=0)
The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.
File Format: Unreduced Data
These are the "LP_
The data can be loaded and used similarly to the downsampled data.
File Format: Overall_Summary
The overall summary file provides data on all the test specimens in the database. The columns include:
File Format: Summarized_Mechanical_Props_Campaign
Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,
tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
keep_default_na=False, na_values='')
Caveats
Facebook
Twitter[doc] file names and splits 2
This dataset contains three csv files at the root: train.csv, test.csv, validation.csv.
Facebook
TwitterThis data set comes from data held by the Driver and Vehicle Standards Agency (DVSA).
It is not classed as an ‘official statistic’. This means it’s not subject to scrutiny and assessment by the UK Statistics Authority.
The MOT test checks that your vehicle meets road safety and environmental standards. Different types of vehicles (for example, cars and motorcycles) fall into different ‘classes’.
This data table shows the number of initial tests. It does not include abandoned tests, aborted tests, or retests.
The initial fail rate is the rate for vehicles as they were brought for the MOT. The final fail rate excludes vehicles that pass the test after rectification of minor defects at the time of the test.
This data table is updated every 3 months.
Ref: DVSA/MOT/01 View online https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1060287/dvsa-mot-01-mot-test-results-by-class-of-vehicle1.csv"> Download CSV 16.1 KB
These tables give data for the following classes of vehicles:
All figures are for vehicles as they were brought in for the MOT.
A failed test usually has multiple failure items.
The percentage of tests is worked out as the number of tests with one or more failure items in the defect as a percentage of total tests.
The percentage of defects is worked out as the total defects in the category as a percentage of total defects for all categories.
The average defects per initial test failure is worked out as the total failure items as a percentage of total tests failed plus tests that passed after rectification of a minor defect at the time of the test.
These data tables are updated every 3 months.
Ref: DVSA/MOT/02 View online https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1060255/dvsa-mot-02-mot-class-1-and-2-vehicles-initial-failures-by-defect-category-.csv"> Download CSV 19.1 KB
Facebook
TwitterThis dataset contains .wav files containing audio in Urdu language. It also has a CSV file what contains the path and sentence of the audios. There are 2 folders, train and test set. They both contain CSV files too that have the audio path and the sentence in those recordings.
Facebook
Twitterhttps://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.