100+ datasets found
  1. w

    Dataset of books series that contain Building machine learning systems with...

    • workwithdata.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of books series that contain Building machine learning systems with Python : master the art of machine learning with Python and build effective machine learning sytems with this intensive hands-on guide [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Building+machine+learning+systems+with+Python+:+master+the+art+of+machine+learning+with+Python+and+build+effective+machine+learning+sytems+with+this+intensive+hands-on+guide&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 1 row and is filtered where the books is Building machine learning systems with Python : master the art of machine learning with Python and build effective machine learning sytems with this intensive hands-on guide. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  2. House Dummy Dataset Maxwell

    • kaggle.com
    zip
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxwell Massie (2025). House Dummy Dataset Maxwell [Dataset]. https://www.kaggle.com/datasets/maxwellmassie/dummy-house-dataset-maxwell
    Explore at:
    zip(8958 bytes)Available download formats
    Dataset updated
    Jan 17, 2025
    Authors
    Maxwell Massie
    Description

    Dataset ini dibuat dengan generate python random, Alasan dibuatnya dataset ini adalah hanya sebagaimana untuk mempelajari konsep linear regression dan penggunaan model machine learning linear regression, meskipun dataset ini di generate dengan formula atau rumus yang mirip dengan harga rumah di dunia nyata, dataset ini tidak dapat menjadi patokan utama untuk memprediksi harga rumah di wilayah tertentu. Terima Kasih..

    Maxwell Massie

  3. n

    Research data underpinning "Investigating Reinforcement Learning Approaches...

    • data.ncl.ac.uk
    application/csv
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zheng Luo (2024). Research data underpinning "Investigating Reinforcement Learning Approaches In Stock Market Trading" [Dataset]. http://doi.org/10.25405/data.ncl.26539735.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    Newcastle University
    Authors
    Zheng Luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The final dataset utilised for the publication "Investigating Reinforcement Learning Approaches In Stock Market Trading" was processed by downloading and combining data from multiple reputable sources to suit the specific needs of this project. Raw data were retrieved by downloading them using a Python finance API. Afterwards, Python and NumPy were used to combine and normalise the data to create the final dataset.The raw data was sourced as follows:Stock Prices of NVIDIA & AMD, Financial Indexes, and Commodity Prices: Retrieved from Yahoo Finance.Economic Indicators: Collected from the US Federal Reserve.The dataset was normalised to minute intervals, and the stock prices were adjusted to account for stock splits.This dataset was used for exploring the application of reinforcement learning in stock market trading. After creating the dataset, it was used in s reinforcement learning environment to train several reinforcement learning algorithms, including deep Q-learning, policy networks, policy networks with baselines, actor-critic methods, and time series incorporation. The performance of these algorithms was then compared based on profit made and other financial evaluation metrics, to investigate the application of reinforcement learning algorithms in stock market trading.The attached 'README.txt' contains methodological information and a glossary of all the variables in the .csv file.

  4. O

    Python Codebase and Jupyter Notebooks - Applications of Machine Learning...

    • data.openei.org
    • gdr.openei.org
    • +3more
    archive
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Brown; Connor Smith; Steve Brown; Connor Smith (2022). Python Codebase and Jupyter Notebooks - Applications of Machine Learning Techniques to Geothermal Play Fairway Analysis in the Great Basin Region, Nevada [Dataset]. http://doi.org/10.15121/1897035
    Explore at:
    archiveAvailable download formats
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    Nevada Bureau of Mines and Geology
    Open Energy Data Initiative (OEDI)
    Authors
    Steve Brown; Connor Smith; Steve Brown; Connor Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Great Basin, Nevada
    Description

    Git archive containing Python modules and resources used to generate machine-learning models used in the "Applications of Machine Learning Techniques to Geothermal Play Fairway Analysis in the Great Basin Region, Nevada" project. This software is licensed as free to use, modify, and distribute with attribution. Full license details are included within the archive. See "documentation.zip" for setup instructions and file trees annotated with module descriptions.

  5. Date-Dataset

    • kaggle.com
    zip
    Updated Aug 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nishtha kukreti (2021). Date-Dataset [Dataset]. https://www.kaggle.com/nishthakukreti/datedataset
    Explore at:
    zip(38657 bytes)Available download formats
    Dataset updated
    Aug 18, 2021
    Authors
    nishtha kukreti
    Description

    Context

    This is the random date data-set generated by me using python script to create a Machine Learning model to tag the date in any given document.

    Content

    This data-set contains whether the given word of word are dates or not

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Implement Machine Learning model or Deep learning Model or train a custom spacy to tag the date and other POS.

  6. n

    Demo dataset for: SPACEc, a streamlined, interactive Python workflow for...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuqi Tan; Tim Kempchen (2024). Demo dataset for: SPACEc, a streamlined, interactive Python workflow for multiplexed image processing and analysis [Dataset]. http://doi.org/10.5061/dryad.brv15dvj1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Stanford University School of Medicine
    Authors
    Yuqi Tan; Tim Kempchen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Multiplexed imaging technologies provide insights into complex tissue architectures. However, challenges arise due to software fragmentation with cumbersome data handoffs, inefficiencies in processing large images (8 to 40 gigabytes per image), and limited spatial analysis capabilities. To efficiently analyze multiplexed imaging data, we developed SPACEc, a scalable end-to-end Python solution, that handles image extraction, cell segmentation, and data preprocessing and incorporates machine-learning-enabled, multi-scaled, spatial analysis, operated through a user-friendly and interactive interface. The demonstration dataset was derived from a previous analysis and contains TMA cores from a human tonsil and tonsillitis sample that were acquired with the Akoya PhenocyclerFusion platform. The dataset can be used to test the workflow and establish it on a user’s system or to familiarize oneself with the pipeline. Methods Tissue samples: Tonsil cores were extracted from a larger multi-tumor tissue microarray (TMA), which included a total of 66 unique tissues (51 malignant and semi-malignant tissues, as well as 15 non-malignant tissues). Representative tissue regions were annotated on corresponding hematoxylin and eosin (H&E)-stained sections by a board-certified surgical pathologist (S.Z.). Annotations were used to generate the 66 cores each with cores of 1mm diameter. FFPE tissue blocks were retrieved from the tissue archives of the Institute of Pathology, University Medical Center Mainz, Germany, and the Department of Dermatology, University Medical Center Mainz, Germany. The multi-tumor-TMA block was sectioned at 3µm thickness onto SuperFrost Plus microscopy slides before being processed for CODEX multiplex imaging as previously described. CODEX multiplexed imaging and processing To run the CODEX machine, the slide was taken from the storage buffer and placed in PBS for 10 minutes to equilibrate. After drying the PBS with a tissue, a flow cell was sealed onto the tissue slide. The assembled slide and flow cell were then placed in a PhenoCycler Buffer made from 10X PhenoCycler Buffer & Additive for at least 10 minutes before starting the experiment. A 96-well reporter plate was prepared with each reporter corresponding to the correct barcoded antibody for each cycle, with up to 3 reporters per cycle per well. The fluorescence reporters were mixed with 1X PhenoCycler Buffer, Additive, nuclear-staining reagent, and assay reagent according to the manufacturer's instructions. With the reporter plate and assembled slide and flow cell placed into the CODEX machine, the automated multiplexed imaging experiment was initiated. Each imaging cycle included steps for reporter binding, imaging of three fluorescent channels, and reporter stripping to prepare for the next cycle and set of markers. This was repeated until all markers were imaged. After the experiment, a .qptiff image file containing individual antibody channels and the DAPI channel was obtained. Image stitching, drift compensation, deconvolution, and cycle concatenation are performed within the Akoya PhenoCycler software. The raw imaging data output (tiff, 377.442nm per pixel for 20x CODEX) is first examined with QuPath software (https://qupath.github.io/) for inspection of staining quality. Any markers that produce unexpected patterns or low signal-to-noise ratios should be excluded from the ensuing analysis. The qptiff files must be converted into tiff files for input into SPACEc. Data preprocessing includes image stitching, drift compensation, deconvolution, and cycle concatenation performed using the Akoya Phenocycler software. The raw imaging data (qptiff, 377.442 nm/pixel for 20x CODEX) files from the Akoya PhenoCycler technology were first examined with QuPath software (https://qupath.github.io/) to inspect staining qualities. Markers with untenable patterns or low signal-to-noise ratios were excluded from further analysis. A custom CODEX analysis pipeline was used to process all acquired CODEX data (scripts available upon request). The qptiff files were converted into tiff files for tissue detection (watershed algorithm) and cell segmentation.

  7. Z

    Data from: FISBe: A real-world benchmark dataset for instance segmentation...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
    Explore at:
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    German Cancer Research Center
    Max Delbrück Center for Molecular Medicine
    Max Delbrück Center
    Howard Hughes Medical Institute - Janelia Research Campus
    Authors
    Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General

    For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

    Summary

    A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

    30 completely labeled (segmented) images

    71 partly labeled images

    altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

    To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

    A set of metrics and a novel ranking score for respective meaningful method benchmarking

    An evaluation of three baseline methods in terms of the above metrics and score

    Abstract

    Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

    Dataset documentation:

    We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

    FISBe Datasheet

    Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

    Files

    fisbe_v1.0_{completely,partly}.zip

    contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

    fisbe_v1.0_mips.zip

    maximum intensity projections of all samples, for convenience.

    sample_list_per_split.txt

    a simple list of all samples and the subset they are in, for convenience.

    view_data.py

    a simple python script to visualize samples, see below for more information on how to use it.

    dim_neurons_val_and_test_sets.json

    a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

    Readme.md

    general information

    How to work with the image files

    Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

    We recommend to work in a virtual environment, e.g., by using conda:

    conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

    How to open zarr files

    Install the python zarr package:

    pip install zarr

    Opened a zarr file with:

    import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

    optional:import numpy as npraw_np = np.array(raw)

    Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

    How to view zarr image files

    We recommend to use napari to view the image data.

    Install napari:

    pip install "napari[all]"

    Save the following Python script:

    import zarr, sys, napari

    raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

    viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

    Execute:

    python view_data.py /R9F03-20181030_62_B5.zarr

    Metrics

    S: Average of avF1 and C

    avF1: Average F1 Score

    C: Average ground truth coverage

    clDice_TP: Average true positives clDice

    FS: Number of false splits

    FM: Number of false merges

    tp: Relative number of true positives

    For more information on our selected metrics and formal definitions please see our paper.

    Baseline

    To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

    License

    The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Citation

    If you use FISBe in your research, please use the following BibTeX entry:

    @misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

    Acknowledgments

    We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

    Changelog

    There have been no changes to the dataset so far.All future change will be listed on the changelog page.

    Contributing

    If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

    All contributions are welcome!

  8. Weather prediction dataset

    • zenodo.org
    • kaggle.com
    csv, png, txt
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Huber; Florian Huber (2024). Weather prediction dataset [Dataset]. http://doi.org/10.5281/zenodo.4770937
    Explore at:
    csv, png, txtAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Florian Huber; Florian Huber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset created for machine learning and deep learning training and teaching purposes.
    Can for instance be used for classification, regression, and forecasting tasks.
    Complex enough to demonstrate realistic issues such as overfitting and unbalanced data, while still remaining intuitively accessible.

    ORIGINAL DATA TAKEN FROM:

    EUROPEAN CLIMATE ASSESSMENT & DATASET (ECA&D), file created on 22-04-2021
    THESE DATA CAN BE USED FREELY PROVIDED THAT THE FOLLOWING SOURCE IS ACKNOWLEDGED:

    Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface
    air temperature and precipitation series for the European Climate Assessment.
    Int. J. of Climatol., 22, 1441-1453.
    Data and metadata available at http://www.ecad.eu

    For more information see metadata.txt file.

    The Python code used to create the weather prediction dataset from the ECA&D data can be found on GitHub: https://github.com/florian-huber/weather_prediction_dataset
    (this repository also contains Jupyter notebooks with teaching examples)

  9. m

    Dataset for twitter Sentiment Analysis using Roberta and Vader

    • data.mendeley.com
    Updated May 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jannatul Ferdoshi Jannatul Ferdoshi (2023). Dataset for twitter Sentiment Analysis using Roberta and Vader [Dataset]. http://doi.org/10.17632/2sjt22sb55.1
    Explore at:
    Dataset updated
    May 14, 2023
    Authors
    Jannatul Ferdoshi Jannatul Ferdoshi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.

    This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.

    In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.

  10. Z

    LandCoverPT Dataset

    • data.niaid.nih.gov
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Esteves (2023). LandCoverPT Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7580260
    Explore at:
    Dataset updated
    Nov 1, 2023
    Dataset provided by
    University of Minho
    Authors
    Antonio Esteves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LandCoverPT - A dataset for training Machine Learning models to classify the Portuguese territory land cover. The creation of the dataset used 26 Sentinel-2 products, captured in June and August 2019, and the products where divided into 153347 patches with 120×120 pixels each. The products cover the Portuguese mainland. The Sentinel-2 data was complemented with Corine Land Cover 2018 data. Each patch includes B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, and B12 Sentinel-2 bands, and the CLC 2018 layer. The dataset is stored in 3 TFRecord files, for training, validation, and test. The training set contains 98141 patches, the validation set includes 24536 patches, and the test set contains 30670 patches. It is also provided code files to experiment with the dataset: - two configuration files in JSON format - a python file with the U-Net model - a python file with a class to create a tf.data.Dataset based on the TFRecord files - a notebook for training the U-Net model on the LandCoverPT dataset - a notebook for evaluating the U-Net model trained on the LandCoverPT dataset - a notebook to make predictions with the U-Net model trained on the LandCoverPT dataset.

  11. d

    Data from: Public supply water use reanalysis for the 2000-2020 period by...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Public supply water use reanalysis for the 2000-2020 period by HUC12, month, and year for the conterminous United States (ver. 2.0, August 2024) [Dataset]. https://catalog.data.gov/dataset/public-supply-water-use-reanalysis-for-the-2000-2020-period-by-huc12-month-and-year-for-th
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Contiguous United States, United States
    Description

    The U.S. Geological Survey is developing national water-use models to support water resources management in the United States. Model benefits include a nationally consistent estimation approach, greater temporal and spatial resolution of estimates, efficient and automated updates of results, and capabilities to forecast water use into the future and assess model uncertainty. The term “reanalysis” refers to the process of reevaluating and recalculating water-use data using updated or refined methods, data sources, models, or assumptions. In this data release, water use refers to water that is withdrawn by public and private water suppliers and includes water provided for domestic, commercial, industrial, thermoelectric power, and public water uses, as well as water that is consumed or lost within the public supply system. Consumptive use refers to water withdrawn by the public supply system that is evaporated, transpired, incorporated into products or crops, or consumed by humans or livestock. This data release contains data used in a machine learning model (child item 2) to estimate monthly water use for communities that are supplied by public-supply water systems in the conterminous United States for 2000-2020. This data release also contains associated scripts used to produce input features (child items 4 - 8) as well as model water use estimates by 12-digit hydrologic unit code (HUC12) and public supply water service area (WSA). HUC12 boundaries are in child item 3. Public supply delivery and consumptive use estimates are in child items 1 and 9, respectively. First posted: November 1, 2023 Revised: August 8, 2024 This version replaces the previous version of the data release: Luukkonen, C.L., Alzraiee, A.H., Larsen, J.D., Martin, D.J., Herbert, D.M., Buchwald, C.A., Houston, N.A., Valseth, K.J., Paulinski, S., Miller, L.D., Niswonger, R.G., Stewart, J.S., and Dieter, C.A., 2023, Public supply water use reanalysis for the 2000-2020 period by HUC12, month, and year for the conterminous United States: U.S. Geological Survey data release, https://doi.org/10.5066/P9FUL880 Version 2.0 This data release has been updated as of 8/8/2024. The previous version has been replaced because some fractions used for downscaling WSA estimates to HUC12 did not sum to one for some WSAs in Virginia. Updated model water use estimates by HUC12 are included in this version. A change was made in two scripts to check for this condition. Output files have also been updated to preserve the leading zero in in the HUC12 codes. Additional files are also included to provide information about mapping the WSAs and groundwater and surface water fractions to HUC12 and to provide public supply water-use estimates by WSA. The 'Machine learning model that estimates total monthly and annual per capita public supply water use' child item has been updated with these corrections and additional files. A new child item 'R code used to estimate public supply consumptive water use' has been added to provide estimates of public supply consumptive use. This page includes the following files: PS_HUC12_Tot_2000_2020.csv - a csv file with estimated monthly public supply total water use from 2000-2020 by HUC12, in million gallons per day PS_HUC12_GW_2000_2020.csv - a csv file with estimated monthly public supply groundwater use for 2000-2020 by HUC12, in million gallons per day PS_HUC12_SW_2000_2020.csv - a csv file with estimated monthly public supply surface water use for 2000-2020 by HUC12, in million gallons per day PS_WSA_Tot_2000_2020.csv - a csv file with estimated monthly public supply total water use from 2000-2020 by WSA, in million gallons per day PS_WSA_GW_2000_2020.csv - a csv file with estimated monthly public supply groundwater use for 2000-2020 by WSA, in million gallons per day PS_WSA_SW_2000_2020.csv - a csv file with estimated monthly public supply surface water use for 2000-2020 by WSA, in million gallons per day Note: 1) Groundwater and surface water fractions were determined using source counts as described in the 'R code that determines groundwater and surface water source fractions for public-supply water service areas, counties, and 12-digit hydrologic units' child item. 2) Some HUC12s have estimated water use of zero because no public-supply water service areas were modeled within the HUC. change_files_format.py - A Python script used to change the water use estimates by WSA and HUC12 files from wide format to the thin and long format version_history.txt - a txt file describing changes in this version The data release is organized into these items: 1. Machine learning model that estimates public supply deliveries for domestic and other use types - The public supply delivery model estimates total delivery of domestic, commercial, industrial, institutional, and irrigation (CII) water use for public supply water service areas within the conterminous United States. This item contains model input datasets, code used to build the delivery machine learning model, and output predictions. 2. Machine learning model that estimates total monthly and annual per capita public supply water use - The public supply water use model estimates total monthly water use for 12-digit hydrologic units within the conterminous United States. This item contains model input datasets, code used to build the water use machine learning model, and output predictions. 3. National watershed boundary (HUC12) dataset for the conterminous United States, retrieved 10/26/2020 - Spatial data consisting of a shapefile with 12-digit hydrologic units for the conterminous United States retrieved 10/26/2020. 4. Python code used to determine average yearly and monthly tourism per 1000 residents for public-supply water service areas - This code was used to create a feature for the public supply model that provides information for areas affected by population increases due to tourism. 5. Python code used to download gridMET climate data for public-supply water service areas - The climate data collector is a tool used to query climate data which are used as input features in the public supply models. 6. Python code used to download U.S. Census Bureau data for public-supply water service areas - The census data collector is a geographic based tool to query census data which are used as input features in the public supply models. 7. R code that determines buying and selling of water by public-supply water service areas - This code was used to create a feature for the public supply model that indicates whether public-supply systems buy water, sell water, or neither buy nor sell water. 8. R code that determines groundwater and surface water source fractions for public-supply water service areas, counties, and 12-digit hydrologic units - This code was used to determine source water fractions (groundwater and/or surface water) for public supply systems and HUC12s. 9. R code used to estimate public supply consumptive water use - This code was used to estimate public supply consumptive water use using an assumed fraction of deliveries for outdoor irrigation and estimates of evaporative demand. This item contains estimated monthly public supply consumptive use datasets by HUC12 and WSA.

  12. c

    Python code used to determine average yearly and monthly tourism per 1000...

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Oct 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Python code used to determine average yearly and monthly tourism per 1000 residents for public-supply water service areas [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/python-code-used-to-determine-average-yearly-and-monthly-tourism-per-1000-residents-for-pu
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This child item describes Python code used to estimate average yearly and monthly tourism per 1000 residents within public-supply water service areas. Increases in population due to tourism may impact amounts of water used by public-supply water systems. This data release contains model input datasets, Python code used to develop the tourism information, and output estimates of tourism. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Output from this code was used as an input feature in the public supply delivery and water use machine learning models. This page includes the following files: tourism_input_data.zip - a zip file containing input data sets used by the tourism Python code tourism_output.zip - a zip file with output produced by the tourism Python code README.txt - a README file describing the data files and code requirements tourism_study_code.zip - a zip file containing the Python code used to create the tourism feature variable

  13. Titanic dataset

    • kaggle.com
    zip
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anakha A S (2024). Titanic dataset [Dataset]. https://www.kaggle.com/datasets/anakha27/titanic-dataset
    Explore at:
    zip(22544 bytes)Available download formats
    Dataset updated
    Jan 16, 2024
    Authors
    Anakha A S
    Description

    The training set should be used to build machine learning models. Here we use some python packages and codes to work on the dataset to derive usefull informations.

  14. Z

    Architectural Design Decisions for Machine Learning Deployment: Dataset and...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Warnett, Stephen John; Zdun, Uwe (2024). Architectural Design Decisions for Machine Learning Deployment: Dataset and Code [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5823458
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    University of Vienna
    Authors
    Warnett, Stephen John; Zdun, Uwe
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    Title: Architectural Design Decisions for Machine Learning Deployment: Dataset and Code

    Authors: Stephen John Warnett; Uwe Zdun

    About: This is the dataset and code artefact for the paper entitled "Architectural Design Decisions for Machine Learning Deployment".

    Contents: The "_generated" directory contains the generated results, including latex files with tables for use in publications and the Architectural Design Decision model in textual and graphical form. "Generators" contains Python applications that can be run to generate the above. "Metamodels" contains a Python file with type definitions. "Sources_coding" contains our source codings and audit trail. "Add_models" contains the Python implementation of our model and source codings. Finally, "appendix" contains a detailed description of our research method.

    Paper Abstract: Deploying machine learning models to production is challenging, partially due to the misalignment between software engineering and machine learning disciplines but also due to potential practitioner knowledge gaps. To reduce this gap and guide decision-making, we conducted a qualitative investigation into the technical challenges faced by practitioners based on studying the grey literature and applying the Straussian Grounded Theory research method. We modelled current practices in machine learning, resulting in a UML-based architectural design decision model based on current practitioner understanding of the domain and a subset of the decision space and identified seven architectural design decisions, various relations between them, twenty-six decision options and forty-four decision drivers in thirty-five sources. Our results intend to help bridge the gap between science and practice, increase understanding of how practitioners approach the deployment of their solutions, and support practitioners in their decision-making.

    Objective: This paper aims to study current practitioner understanding of architectural concepts associated with machine learning deployment.

    Method: Applying Straussian Grounded Theory to gray literature sources containing practitioner views on machine learning practices, we studied methods and techniques currently applied by practitioners in the context of machine learning solution development and gained valuable insights into the software engineering and architectural state of the art as applied to ML.

    Results: Our study resulted in a model of Architectural Design Decisions, practitioner practices, and decision drivers in the field of software engineering and software architecture for machine learning.

    Conclusions: The resulting Architectural Design Decisions model can help researchers better understand practitioners' needs and the challenges they face, and guide their decisions based on existing practices. The study also opens new avenues for further research in the field, and the design guidance provided by our model can also help reduce design effort and risk. In future work, we plan on using our findings to provide automated design advice to machine learning engineers.

  15. Dataset of 842 Emojis

    • kaggle.com
    zip
    Updated Jan 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Divyanshu Singh369 (2023). Dataset of 842 Emojis [Dataset]. https://www.kaggle.com/datasets/divyanshusingh369/dataset-of-843-emojis
    Explore at:
    zip(12849 bytes)Available download formats
    Dataset updated
    Jan 8, 2023
    Authors
    Divyanshu Singh369
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is a collection of 842 emoji. I am a Python Developer and Full Stack Developer. I wanted to create a machine learning program using image processing with opencv that would show which human facial feature is similar to which emoji. So, I have created this dataset using web scraping and scraped this data from Unicode Emojis. I hope you enjoy it and learn something good from it. Since this is my first dataset, all suggestions are welcome.😄😄😄

  16. d

    Data from: Multi-task Deep Learning for Water Temperature and Streamflow...

    • catalog.data.gov
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022) [Dataset]. https://catalog.data.gov/dataset/multi-task-deep-learning-for-water-temperature-and-streamflow-prediction-ver-1-1-june-2022
    Explore at:
    Dataset updated
    Nov 11, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:

    1. input_data_processing.zip: A zip file containing the scripts used to collate the observations, input weather drivers, and catchment attributes for the multi-task modeling experiments
    2. flow_observations.zip: A zip file containing collated daily streamflow data for the sites used in multi-task modeling experiments. The streamflow data were originally accessed from the CAMELs dataset. The data are stored in csv and Zarr formats.
    3. temperature_observations.zip: A zip file containing collated daily water temperature data for the sites used in multi-task modeling experiments. The data were originally accessed via NWIS. The data are stored in csv and Zarr formats.
    4. temperature_sites.geojson: Geojson file of the locations of the water temperature and streamflow sites used in the analysis.
    5. model_drivers.zip: A zip file containing the daily input weather driver data for the multi-task deep learning models. These data are from the Daymet drivers and were collated from the CAMELS dataset. The data are stored in csv and Zarr formats.
    6. catchment_attrs.csv: Catchment attributes collatted from the CAMELS dataset. These data are used for the Random Forest modeling. For full metadata regarding these data see CAMELS dataset.
    7. experiment_workflow_files.zip: A zip file containing workflow definitions used to run multi-task deep learning experiments. These are Snakemake workflows. To run a given experiment, one would run (for experiment A) 'snakemake -s expA_Snakefile --configfile expA_config.yml'
    8. river-dl-paper_v0.zip: A zip file containing python code used to run multi-task deep learning experiments. This code was called by the Snakemake workflows contained in 'experiment_workflow_files.zip'.
    9. random_forest_scripts.zip: A zip file containing Python code and a Python Jupyter Notebook used to prepare data for, train, and visualize feature importance of a Random Forest model.
    10. plotting_code.zip: A zip file containing python code and Snakemake workflow used to produce figures showing the results of multi-task deep learning experiments.
    11. results.zip: A zip file containing results of multi-task deep learning experiments. The results are stored in csv and netcdf formats. The netcdf files were used by the plotting libraries in 'plotting_code.zip'. These files are for five experiments, 'A', 'B', 'C', 'D', and 'AuxIn'. These experiment names are shown in the file name.
    12. sample_scripts.zip: A zip file containing scripts for creating sample output to demonstrate how the modeling workflow was executed.
    13. sample_output.zip: A zip file containing sample output data. Similar files are created by running the sample scripts provided.
    A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D

    N. Addor, A. Newman, M. Mizukami, and M. P. Clark, 2017. Catchment attributes for large-sample studies. Boulder, CO: UCAR/NCAR. https://doi.org/10.5065/D6G73C3Q

    Sadler, J. M., Appling, A. P., Read, J. S., Oliver, S. K., Jia, X., Zwart, J. A., & Kumar, V. (2022). Multi-Task Deep Learning of Daily Streamflow and Water Temperature. Water Resources Research, 58(4), e2021WR030138. https://doi.org/10.1029/2021WR030138

    U.S. Geological Survey, 2016, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed Dec. 2020.

  17. u

    Surrogate flood model comparison - Datasets and python code

    • figshare.unimelb.edu.au
    bin
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niels Fraehr (2024). Surrogate flood model comparison - Datasets and python code [Dataset]. http://doi.org/10.26188/24312658.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    The University of Melbourne
    Authors
    Niels Fraehr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used for publication in "Assessment of surrogate models for flood inundation: The physics-guided LSG model vs. state-of-the-art machine learning models". Five surrogate models for flood inundation is to emulate the results of high-resolution hydrodynamic models. The surrogate models are compared based on accuracy and computational speed for three distinct case studies namely Carlisle (United Kingdom), Chowilla floodplain (Australia), and Burnett River (Australia).The dataset is structured in 5 files - "Carlisle", "Chowilla", "BurnettRV", "Comparison_results", and "Python_data". As a minimum to run the models the "Python_data" file and one of "Carlisle", "Chowilla", or "BurnettRV" are needed. We suggest to use the "Carlisle" case study for initial testing given its small size and small data requirement."Carlisle", "Chowilla", and "BurnettRV" files These files contain hydrodynamic modelling data for training and validation for each individual case study, as well as specific Python scripts for training and running the surrogate models in each case study. There are only small differences between each folder, depending on the hydrodynamic model trying to emulate and input boundary conditions (input features).Each case study file has the following folders:Geometry_data: DEM files, .npz files containing of the high-fidelity models grid (XYZ-coordinates) and areas (Same data is available for the low-fidelity model used in the LSG model), .shp files indicating location of boundaries and main flow paths (mainly used in the LSTM-SRR model). XXX_modeldata: Folder to storage trained model data for each XXX surrogate model. For example, GP_EOF_modeldata contains files used to store the trainined GP-EOF model.HD_model_data: High-fidelity (And low-fidelity) simulation results for all flood events of that case study. This folder also contains all boundary input conditions.HF_EOF_analysis: Storing of data used in the EOF analysis. EOF analysis is applied for the LSG, GP-EOF, and LSTM-EOF surrogate models. Results_data: Storing results of running the evaluation of the surrogate models.Train_test_split_data: The train-test-validation data split is the same for all surrogate models. The specific split for each cross-validation fold is stored in this folder.And Python files:YYY_event_summary, YYY_Extrap_event_summary: Files containing overview of all events, and which events are connected between the low- and high-fidelity models for each YYY case study.EOF_analysis_HFdata_preprocessing, EOF_analysis_HFdata: Preprocessing before EOF analysis and the EOF analysis of the high-fidelity data. This is used for the LSG, GP-EOF, and LSTM-EOF surrogate models.Evaluation, Evaluation_extrap: Scripts for evaluating the surrogate model for that case study and saving the results for each cross-validation fold.train_test_split: Script for splitting the flood datasets for each cross-validation fold, so all surrogate models train on the same data.XXX_training: Script for training each XXX surrogate model.XXX_preprocessing: Some surrogate models might rely on some information that needs to be generated before training. This is performed using these scripts."Comparison_results" fileFiles used for comparing surrogate models and generate the figures in the paper "Assessment of surrogate models for flood inundation: The physics-guided LSG model vs. state-of-the-art machine learning models". Figures are also included. "Python_data" fileFolder containing Python script with utility functions for setting up, training, and running the surrogate models, as well as for evaluating the surrogate models. This folder also contains a python_environment.yml file with all Python package versions and dependencies.This folder also contains two sub-folders:LSG_mods_and_func: Python scripts for using the LSG model. Some of these scripts are also utilized when working with the other surrogate models. SRR_method_master_Zhou2021: Scripts obtained from https://github.com/yuerongz/SRR-method. Small edits have for speed and use in this study.

  18. m

    SalmonScan: A Novel Image Dataset for Fish Disease Detection in Salmon...

    • data.mendeley.com
    Updated Feb 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Shoaib Ahmed (2024). SalmonScan: A Novel Image Dataset for Fish Disease Detection in Salmon Aquaculture System [Dataset]. http://doi.org/10.17632/x3fz2nfm4w.1
    Explore at:
    Dataset updated
    Feb 27, 2024
    Authors
    Md Shoaib Ahmed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SalmonScan dataset is a collection of images of salmon fish, including healthy fish and infected fish. The dataset consists of two classes of images:

    Fresh salmon 🐟 Infected Salmon 🐠

    This dataset is ideal for various computer vision tasks in machine learning and deep learning applications. Whether you are a researcher, developer, or student, the SalmonScan dataset offers a rich and diverse data source to support your projects and experiments.

    So, dive in and explore the fascinating world of salmon health and disease!

    The SalmonScan dataset consists of approximately 1,208 images of salmon fish, classified into two classes:

    • Fresh salmon (healthy fish with no visible signs of disease), 456 images
    • Infected Salmon containing disease, 752 images

    Each class contains a representative and diverse collection of images, capturing a range of different perspectives, scales, and lighting conditions. The images have been carefully curated to ensure that they are of high quality and suitable for use in a variety of computer vision tasks.

    Data Preprocessing

    The input images were preprocessed to enhance their quality and suitability for further analysis. The following steps were taken:

    Resizing 📏: All the images were resized to a uniform size of 600 pixels in width and 250 pixels in height to ensure compatibility with the learning algorithm.

    Image Augmentation 📸: To overcome the small amount of images, various image augmentation techniques were applied to the input images. These included:

    Horizontal Flip ↩️: The images were horizontally flipped to create additional samples. Vertical Flip ⬆️: The images were vertically flipped to create additional samples. Rotation 🔄: The images were rotated to create additional samples. Cropping 🪓: A portion of the image was randomly cropped to create additional samples. Gaussian Noise 🌌: Gaussian noise was added to the images to create additional samples. Shearing 🌆: The images were sheared to create additional samples. Contrast Adjustment (Gamma) ⚖️: The gamma correction was applied to the images to adjust their contrast. Contrast Adjustment (Sigmoid) ⚖️: The sigmoid function was applied to the images to adjust their contrast.

    Usage

    To use the salmon scan dataset in your ML and DL projects, follow these steps:

    • Clone or download the salmon scan dataset repository from GitHub.
    • Unzip the file to access the two folders (FreshFish and InfectedFish).
    • Load the images into your preferred programming environment, such as Python.
    • Use standard libraries such as numpy or pandas to convert the images into arrays, which can be input into a machine learning or deep learning model.
    • Split the dataset into training, validation, and test sets as per your requirement.
    • Preprocess the data as needed, such as resizing and normalizing the images.
    • Train your ML/DL model using the preprocessed training data.
    • Evaluate the model on the test set and make predictions on new, unseen data.
  19. d

    Data from: Lithium observations, machine-learning predictions, and mass...

    • catalog.data.gov
    • data.usgs.gov
    Updated Oct 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Lithium observations, machine-learning predictions, and mass estimates from the Smackover Formation brines in southern Arkansas [Dataset]. https://catalog.data.gov/dataset/lithium-observations-machine-learning-predictions-and-mass-estimates-from-the-smackover-fo
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Arkansas
    Description

    Global demand for lithium, the primary component of lithium-ion batteries, greatly exceeds known supplies and this imbalance is expected to increase as the world transitions away from fossil fuel energy sources. The goal of this work was to calculate the total lithium mass in brines of the Reynolds oolite unit of the Smackover Formation in southern Arkansas using predicted lithium concentrations from a machine-learning model. This research was completed collaboratively between the U.S. Geological Survey and the Arkansas Department of Energy and Environment—Office of the State Geologist. The Smackover Formation is a laterally extensive petroleum and brine system in the Gulf Coast region that includes locally high concentrations of bromide and lithium in southern Arkansas. This data release contains input files, Python scripts, and an R script used to prepare input files, create a random forest (RF) machine-learning model to predict lithium concentrations, and compute uncertainty in brines of the Reynolds oolite unit of the Smackover Formation in southern Arkansas. This data release also contains a Python script to calculate the total mass of lithium in brines of the Reynolds oolite unit of the Smackover Formation in southern Arkansas based on porosity. Knowledge of data-science and Python and R programming languages is a prerequisite for executing the workflow associated with this product. Users can execute the scripts to prepare input data, train a RF machine-learning model, compute uncertainty, and calculate lithium mass. Explanatory variables used to train the RF model included geologic, geochemical, and temperature data from either published datasets or created and documented in this data release and the associated companion publication (Knierim and others, 2024). See the associated metadata for details. This data release also includes output files (csvs [comma-delimited, plain-text] and rasters [geospatial grids]) of lithium concentration predictions from the RF model, uncertainty ranges, and lithium mass. The depth of prediction of lithium concentration represents the mid-point depth of the Reynolds oolite unit which varies between approximately 3,500 and 11,300 feet deep (below land-surface datum) and 0 and 400 feet thick across the model domain. For a full explanation of methods and results, see the companion manuscript Knierim and others (2024).

  20. terraceDL: A geomorphology deep learning dataset of agricultural terraces in...

    • figshare.com
    bin
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Maxwell (2023). terraceDL: A geomorphology deep learning dataset of agricultural terraces in Iowa, USA [Dataset]. http://doi.org/10.6084/m9.figshare.22320373.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Aaron Maxwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Iowa, United States
    Description

    scripts.zip

    arcgisTools.atbx: terrainDerivatives: make terrain derivatives from digital terrain model (Band 1 = TPI (50 m radius circle), Band 2 = square root of slope, Band 3 = TPI (annulus), Band 4 = hillshade, Band 5 = multidirectional hillshades, Band 6 = slopeshade). rasterizeFeatures: convert vector polygons to raster masks (1 = feature, 0 = background).

    makeChips.R: R function to break terrain derivatives and chips into image chips of a defined size. makeTerrainDerivatives.R: R function to generated 6-band terrain derivatives from digital terrain data (same as ArcGIS Pro tool). merge_logs.R: R script to merge training logs into a single file. predictToExtents.ipynb: Python notebook to use trained model to predict to new data. trainExperiments.ipynb: Python notebook used to train semantic segmentation models using PyTorch and the Segmentation Models package. assessmentExperiments.ipynb: Python code to generate assessment metrics using PyTorch and the torchmetrics library. graphs_results.R: R code to make graphs with ggplot2 to summarize results. makeChipsList.R: R code to generate lists of chips in a directory. makeMasks.R: R function to make raster masks from vector data (same as rasterizeFeatures ArcGIS Pro tool).

    terraceDL.zip

    dems: LiDAR DTM data partitioned into training, testing, and validation datasets based on HUC8 watershed boundaries. Original DTM data were provided by the Iowa BMP mapping project: https://www.gis.iastate.edu/BMPs. extents: extents of the training, testing, and validation areas as defined by HUC 8 watershed boundaries. vectors: vector features representing agricultural terraces and partitioned into separate training, testing, and validation datasets. Original digitized features were provided by the Iowa BMP Mapping Project: https://www.gis.iastate.edu/BMPs.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Work With Data (2024). Dataset of books series that contain Building machine learning systems with Python : master the art of machine learning with Python and build effective machine learning sytems with this intensive hands-on guide [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Building+machine+learning+systems+with+Python+:+master+the+art+of+machine+learning+with+Python+and+build+effective+machine+learning+sytems+with+this+intensive+hands-on+guide&j=1&j0=books

Dataset of books series that contain Building machine learning systems with Python : master the art of machine learning with Python and build effective machine learning sytems with this intensive hands-on guide

Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about book series. It has 1 row and is filtered where the books is Building machine learning systems with Python : master the art of machine learning with Python and build effective machine learning sytems with this intensive hands-on guide. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

Search
Clear search
Close search
Google apps
Main menu