4 datasets found
  1. 1200 pixels spectral datasets

    • zenodo.org
    zip
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hui Zhang; Hui Zhang (2024). 1200 pixels spectral datasets [Dataset]. http://doi.org/10.5281/zenodo.11082600
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hui Zhang; Hui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the Zip, spectral. npy was the average spectral data of red ginseng, mycotoxins and interference impurities, and label. npy was the corresponding label. Spectral data format was [1200,510] and label data format was [1200,1]. The example of data usage (sklearn in Python database was used to establish the classification model) was as follows:

    import numpy as np
    from sklearn. model_selection import train_test_split
    from sklearn. preprocessing import StandardScaler
    from sklearn. neighbors import KNeighborsClassifier
    from sklearn. metrics import classification_report, accuracy_score

    # Load spectral data and labels
    x = np.load('.../spectral.npy')[:,1:-1]
    y = np.load('.../label.npy')

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

    # Data standardization
    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_test = scaler.transform(x_test)

    # Train the KNN model
    knn_model = KNeighborsClassifier(n_neighbors=5)
    knn_model. fit(x_train, y_train)

    # Predict
    y_pred = knn_model.predict(x_test)

    # Print classification reports and accuracy rates
    print("Classification Report:")
    print(classification_report(y_test, y_pred))
    print("Accuracy Score:")
    print(accuracy_score(y_test, y_pred))

  2. What you see is what you get: Delineating the urban jobs-housing spatial...

    • figshare.com
    zip
    Updated Feb 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yao Yao; Jiaqi Zhang; Chen Qian; Yu Wang; Shuliang Ren; Zehao Yuan; Qingfeng Guan (2021). What you see is what you get: Delineating the urban jobs-housing spatial distribution at a parcel scale by using street view imagery [Dataset]. http://doi.org/10.6084/m9.figshare.12960212.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 12, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yao Yao; Jiaqi Zhang; Chen Qian; Yu Wang; Shuliang Ren; Zehao Yuan; Qingfeng Guan
    License

    https://www.gnu.org/copyleft/gpl.htmlhttps://www.gnu.org/copyleft/gpl.html

    Description

    The compressed package (Study_code.zip) contains the code files implemented by an under review paper ("What you see is what you get: Delineating urban jobs-housing spatial distribution at a parcel scale by using street view imagery based on deep learning technique").The compressed package (input_land_parcel_with_attributes.zip) is the sampled mixed "jobs-housing" attributes data of the study area with multiple probability attributes (Only working, Only living, working and living) at the land parcel scale.The compressed package (input_street_view_images.zip) is the surrounding street view data near sampled land parcels (input_land_parcel_with_attributes.zip) with the pixel size of 240*160 obtained from Tencent map (https://map.qq.com/).The compressed package (output_results.zip) contains the result vector files (Jobs-housing pattern distribution and error distribution) and file description (Readme.txt).This project uses some Python open source libraries (Numpy, Pandas, Selenium, Gdal, Pytorch and sklearn). This project complies with the GPL license.Numpy (https://numpy.org/) is an open source numerical calculation tool developed by Travis Oliphant. Used in this project for matrix operation. This library complies with the BSD license.Pandas (https://pandas.pydata.org/) is an open source library, providing high-performance, easy-to-use data structures and data analysis tools. This library complies with the BSD license.Selenium(https://www.selenium.dev/) is a suite of tools for automating web browsers.Used in this project for getting street view images.This library complies with the BSD license.Gdal(https://gdal.org/) is a translator library for raster and vector geospatial data formats.Used in this project for processing geospatial data.This library complies with the BSD license.Pytorch(https://pytorch.org/) is an open source machine learning framework that accelerates the path from research prototyping to production deployment.Used in this project for deep learning.This library complies with the BSD license.sklearn(https://scikit-learn.org/) is an open source machine learning tool for python.Used in this project for comparing precision metrics.This library complies with the BSD license.

  3. Spatial distribution of particulate matter, collected using low cost...

    • zenodo.org
    bin
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janani Venkatraman Jagatha; Janani Venkatraman Jagatha; Christoph Schneider; Christoph Schneider; Sebastian Schubert; Luxi Jin; Sebastian Schubert; Luxi Jin (2025). Spatial distribution of particulate matter, collected using low cost sensors, in Downtown-Singapore [Dataset]. http://doi.org/10.5281/zenodo.14280847
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janani Venkatraman Jagatha; Janani Venkatraman Jagatha; Christoph Schneider; Christoph Schneider; Sebastian Schubert; Luxi Jin; Sebastian Schubert; Luxi Jin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Singapore
    Description

    The dataset consists of particulate matter concentration and meteorology data, measured in Singapore, Chinatown, and Central business district from March 13, 2018, to March 16, 2018. The data collectors walked from the Outram district - Chinatown to the Central Business District in Singapore. The measurements were carried out using a hand-held air quality sensor ensemble (URBMOBI 3.0).

    The dataset contains information from two URBMOBI 3.0 devices and one reference-grade device (Grimm 1.109). The data from the sensors and Grimm are denoted by the subscript, 's1', 's2', and 'gr', respectively.

    singapore_all_pm_25.geojson : The observed PM concentration and meteorology, aggregated using a 25 m buffer around the measurement points.

    Information on working with geojson file can be found under GeoJSON .

    Units:
    PM : µg/m³
    Scaled_PM_MM : Dimensionless entity scaled using Min-Max-Scaler (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)
    Scaled_PM_SS : Dimensionless entity scaled using Standard-Scaler (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
    Air temperature: °C
    Relative humidity: %

    The measurements are part of the "Effects of heavy precipitation events on near-surface climate and particulate matter concentrations in Singapore". It is funded by the support from Humboldt-Universität zu Berlin for seed funding for collaborative projects between National University of Singapore and Humboldt-Universität zu Berlin.

  4. Clustering Exercises

    • kaggle.com
    Updated Apr 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joonas (2022). Clustering Exercises [Dataset]. https://www.kaggle.com/datasets/joonasyoon/clustering-exercises
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2022
    Dataset provided by
    Kaggle
    Authors
    Joonas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    https://i.imgur.com/ZUX61cD.png" alt="Overview">

    Context

    The method of disuniting similar data is called clustering. you can create dummy data for classifying clusters by method from sklearn package but it needs to put your effort into job.

    For users who making hard test cases for example of clustering, I think this dataset helps them.

    Try out to select a meaningful number of clusters, and dividing the data into clusters. Here are exercises for you.

    Dataset

    All csv files contain a lots of x, y and color, and you can see above figures.

    If you want to use position as type of integer, scale it and round off to integer as like x = round(x * 100).

    Furthermore, here is GUI Tool to generate 2D points for clustering. you can make your dataset with this tool. https://www.joonas.io/cluster-paint

    Stay tuned for further updates! also if any idea, you can comment me.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hui Zhang; Hui Zhang (2024). 1200 pixels spectral datasets [Dataset]. http://doi.org/10.5281/zenodo.11082600
Organization logo

1200 pixels spectral datasets

Explore at:
zipAvailable download formats
Dataset updated
May 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hui Zhang; Hui Zhang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In the Zip, spectral. npy was the average spectral data of red ginseng, mycotoxins and interference impurities, and label. npy was the corresponding label. Spectral data format was [1200,510] and label data format was [1200,1]. The example of data usage (sklearn in Python database was used to establish the classification model) was as follows:

import numpy as np
from sklearn. model_selection import train_test_split
from sklearn. preprocessing import StandardScaler
from sklearn. neighbors import KNeighborsClassifier
from sklearn. metrics import classification_report, accuracy_score

# Load spectral data and labels
x = np.load('.../spectral.npy')[:,1:-1]
y = np.load('.../label.npy')

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Data standardization
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Train the KNN model
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model. fit(x_train, y_train)

# Predict
y_pred = knn_model.predict(x_test)

# Print classification reports and accuracy rates
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score:")
print(accuracy_score(y_test, y_pred))

Search
Clear search
Close search
Google apps
Main menu