Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data package is associated with the publication “Prediction of Distributed River Sediment Respiration Rates using Community-Generated Data and Machine Learning’’ submitted to the Journal of Geophysical Research: Machine Learning and Computation (Scheibe et al. 2024). River sediment respiration observations are expensive and labor intensive to obtain and there is no physical model for predicting this quantity. The Worldwide Hydrobiogeochemisty Observation Network for Dynamic River Systems (WHONDRS) observational data set (Goldman et al.; 2020) is used to train machine learning (ML) models to predict respiration rates at unsampled sites. This repository archives training data, ML models, predictions, and model evaluation results for the purposes of reproducibility of the results in the associated manuscript and community reuse of the ML models trained in this project. One of the key challenges in this work was to find an optimum configuration for machine learning models to work with this feature-rich (i.e. 100+ possible input variables) data set. Here, we used a two-tiered approach to managing the analysis of this complex data set: 1) a stacked ensemble of ML models that can automatically optimize hyperparameters to accelerate the process of model selection and tuning and 2) feature permutation importance to iteratively select the most important features (i.e. inputs) to the ML models. The major elements of this ML workflow are modular, portable, open, and cloud-based, thus making this implementation a potential template for other applications. This data package is associated with the GitHub repository found at https://github.com/parallelworks/sl-archive-whondrs. A static copy of the GitHub repository is included in this data package as an archived version at the time of publishing this data package (March 2023). However, we recommend accessing these files via GitHub for full functionality. Please see the file level metadata (flmd; “sl-archive-whondrs_flmd.csv”) for a list of all files contained in this data package and descriptions for each. Please see the data dictionary (dd; “sl-archive-whondrs_dd.csv”) for a list of all column headers contained within comma separated value (csv) files in this data package and descriptions for each. The GitHub repository is organized into five top-level directories: (1) “input_data” holds the training data for the ML models; (2) “ml_models” holds machine learning models trained on the data in “input_data”; (3) “scripts” contains data preprocessing and postprocessing scripts and intermediate results specific to this data set that bookend the ML workflow; (4) “examples” contains the visualization of the results in this repository including plotting scripts for the manuscript (e.g., model evaluation, FPI results) and scripts for running predictions with the ML models (i.e., reusing the trained ML models); (5) “output_data” holds the overall results of the ML model on that branch. Each trained ML model resides on its own branch in the repository; this means that inputs and outputs can be different branch-to-branch. Furthermore, depending on the number of features used to train the ML models, the preprocessing and postprocessing scripts, and their intermediate results, can also be different branch-to-branch. The “main-*” branches are meant to be starting points (i.e. trunks) for each model branch (i.e. sprouts). Please see the Branch Navigation section in the top-level README.md in the GitHub repository for more details. There is also one hidden directory “.github/workflows”. This hidden directory contains information for how to run the ML workflow as an end-to-end automated GitHub Action but it is not needed for reusing the ML models archived here. Please the top-level README.md in the GitHub repository for more details on the automation.
Families of tax filers; Single-earner and dual-earner census families by number of children (final T1 Family File; T1FF).
https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in Puerto Rico per the most current US Census data, including information on rank and average income.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Captures full set of data compiled from sources including IMF Article IV Country Reports, Energy Information Administration, World Development Indicators and the International Country Risk Group (ICRG).
https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in North Carolina per the most current US Census data, including information on rank and average income.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.