Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset contains two files in h5 format: 1. test_catvnoncat.h5: It contains 50 test examples of cat and non-cat images 2. train_catvnoncat.h5: It contains 209 train examples of cat and non-cat images
The dataset contains images of size 64x64. The task is to classify an image as a cat (1) or not a cat (0). I am going to publish a series of notebooks for this dataset that would demonstrate neural networks from very basic level. Each notebook will build upon the previous one. Stay tuned to learn neural networks with the help of those notebooks!
You can use the below code snippet to load and visualize the dataset. ```python import numpy as np import matplotlib.pyplot as plt import h5py import os
for dirname, _, filenames in os.walk('/kaggle/input'): for filename in filenames: print(os.path.join(dirname, filename))
def load_data(): train_dataset = h5py.File('/kaggle/input/cat-images-classification-dataset/train_catvnoncat.h5', "r") train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('/kaggle/input/cat-images-classification-dataset/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
train_x_orig, train_y, test_x_orig, test_y, classes = load_data()
index = 10 plt.imshow(train_x_orig[index]) print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") + " picture.") ```
Facebook
TwitterThis dataset contains materials from the Coalition for Community-Supported Affordable Geothermal Energy Systems (C2SAGES) project, which evaluated the techno-economic feasibility of a community geothermal system for a residential development in Hinesburg, VT. The dataset includes detailed soil conductivity test reports, energy models, borehole design reports, hourly energy loads for heating, cooling, and hot water, and design layouts. EnergyPlus was used to model building energy loads, and Modelica software was applied for geothermal loop sizing based on these loads and soil conductivity results. Python scripts for network design further refined the models. Key files include PDF reports on borehole design (with projections for 1-year, 15-year, and 30-year systems), soil conductivity test results, EnergyPlus modeling outputs, and 2D/3D design drawings in PDF, DWG, and DXF formats. Python notebooks for network design and OnePipe model files are also provided, with Modelica required for viewing certain files. Outputs and modeling data are in various formats including CSV, JPG, HTML, and IDF, with units and data clearly labeled to support understanding of system design and performance for the proposed geothermal solution.
Facebook
TwitterNOTE: It's easier to download this dataset from pyrfume. Here's how:
import pyrfume molecules = pyrfume.load_data('leffingwell/molecules.csv', remote=True) behavior = pyrfume.load_data('leffingwell/behavior.csv', remote=True)
behavior.sum().sort_values(ascending=False).astype(int)
Predicting properties of molecules is an area of growing research in machine learning, particularly as models for learning from graph-valued inputs improve in sophistication and robustness. A molecular property prediction problem that has received comparatively little attention during this surge in research activity is building Structure-Odor Relationships (SOR) models (as opposed to Quantitative Structure-Activity Relationships, a term from medicinal chemistry). This is a 70+ year-old problem straddling chemistry, physics, neuroscience, and machine learning.
To spur development on the SOR problem, we curated and cleaned a dataset of 3523 molecules associated with expert-labeled odor descriptors from the Leffingwell PMP 2001 database. We provide featurizations of all molecules in the dataset using bit-based and count-based fingerprints, Mordred molecular descriptors, and the embeddings from our trained GNN model (Sanchez-Lengeling et al., 2019). This dataset is comprised of two files:
leffingwell_data.csv: this contains molecular structures, and what they smell like, along with train, test, and cross-validation splits. More detail on the file structure is found in leffingwell_readme.pdf.
leffingwell_embeddings.npz: this contains several featurizations of the molecules in the dataset.
leffingwell_readme.pdf: a more detailed description of the data and its provenance, including expected performance metrics.
LICENSE: a copy of the CC-BY-NC license language.
The dataset, and all associated features, is freely available for research use under the CC-BY-NC license.
If you use the data in a publication, please cite:
@article{sanchez2019machine, title={Machine learning for scent: Learning generalizable perceptual representations of small molecules}, author={Sanchez-Lengeling, Benjamin and Wei, Jennifer N and Lee, Brian K and Gerkin, Richard C and Aspuru-Guzik, Al{\'a}n and Wiltschko, Alexander B}, journal={arXiv preprint arXiv:1910.10685}, year={2019} }
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset contains two files in h5 format: 1. test_catvnoncat.h5: It contains 50 test examples of cat and non-cat images 2. train_catvnoncat.h5: It contains 209 train examples of cat and non-cat images
The dataset contains images of size 64x64. The task is to classify an image as a cat (1) or not a cat (0). I am going to publish a series of notebooks for this dataset that would demonstrate neural networks from very basic level. Each notebook will build upon the previous one. Stay tuned to learn neural networks with the help of those notebooks!
You can use the below code snippet to load and visualize the dataset. ```python import numpy as np import matplotlib.pyplot as plt import h5py import os
for dirname, _, filenames in os.walk('/kaggle/input'): for filename in filenames: print(os.path.join(dirname, filename))
def load_data(): train_dataset = h5py.File('/kaggle/input/cat-images-classification-dataset/train_catvnoncat.h5', "r") train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('/kaggle/input/cat-images-classification-dataset/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
train_x_orig, train_y, test_x_orig, test_y, classes = load_data()
index = 10 plt.imshow(train_x_orig[index]) print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") + " picture.") ```