Facebook
TwitterThis dataset was created by Monis Ahmad
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
Data visualization using Python (Pandas, Plotly).
Data was used to visualization of the infection rate and the death rate from 01/20 to 04/22.
The data was made available on Github: https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?
Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?
Facebook
TwitterThis child item describes Python code used to retrieve gridMET climate data for a specific area and time period. Climate data were retrieved for public-supply water service areas, but the climate data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the climate data collector code were used as input feature variables in the public supply delivery and water use machine learning models. This page includes the following file: climate_data_collector.zip - a zip file containing the climate data collector Python code used to retrieve climate data and a README file.
Facebook
TwitterThe EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 10/29/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
Facebook
TwitterThis child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The file called analysis.pdf provides the code we used to analyze data. The filed called module.pdf is the module we created to access Numina data. The module requires the python packages called gql and pandas. The code requires a password from Numina.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cadaster data from PDOK used to illustrate the use of geopandas and shapely, geospatial python packages for manipulating vector data. The brpgewaspercelen_definitief_2020.gpkg file has been subsetted in order to make the download manageable for workshops. Other datasets are copies of those available from PDOK.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.
This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The mainstem Logan River is a suitable habitat for cold-water fishes such as native populations of cutthroat trout (Budy & Gaeta, 2018). On the other hand, high water temperatures can harm cold-water fish populations by creating physiological stresses, intensifying metabolic demands, and limiting suitable habitats (Williams & et al., 2015). In this regard, the State of Utah Department of Environmental Quality (UDEQ) has identified the Logan River as a suitable habitat for cold-water species, which can become unsuitable when the water temperature rises higher than 20 degrees Celsius (Rule R317-2, 2022). However, the UDEQ does not provide any details on how to evaluate the violations from the standard. One way to evaluate violations is to look at water temperature distributions (i.e., histograms) along the river from high elevations to low elevations at different locations. In this report, I used three different Python libraries to manipulate, extract, and explore the water temperature data of the Logan River from 2014 to 2021 obtained from the Logan River Observatory website. The results (i.e., the generated histograms by executing Jupyter Notebook in the HydroShare environment) show that the Logan River tends to experience higher water temperatures as its elevation drops regardless of the season. This can provide some insights for the UDEQ to simultaneously consider space and time in assessing violations from the standard.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This code retrieves stream info from nwis using the dataretrieval tool in python. You can input site, parameters, and dates at the top. This code pulls daily measurements, annual_stats, and daily_stats. We calculate 30-year normals, as well as plot annual average flows, annual min, max, and mean flows, and percentile flows. This resource only pulls from the USGS ftp site and doesn't have or require any local storage.
Facebook
Twitterhttps://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34810/data2322https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34810/data2322
CatCrops_identification is a Python library developed for the early classification of crop types using remote sensing data (Sentinel-2) and ancillary information. It is based on a Transformer model adapted for the analysis of spectral time series with variable length, and it allows the integration of auxiliary data such as the previous year’s crop, irrigation system, cloud cover, elevation, and other geographic features. The library provides tools to download and prepare datasets, train deep learning models, and generate vector maps with plot-level classification. CatCrops_identification includes scripts to automate the entire workflow and offers a public dataset that combines declared and inspected information on crop types in the Lleida region. This approach improves classification accuracy in the early stages of the agricultural season, offering a robust and efficient tool for agricultural planning and water resource management.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This presentation gives an overview of using Python programming to optimize CO2 storage simulations conducted with Computer Modelling Group (CMG) software. Solutions to two problems are discussed. First, spatially representing data from CMG simulation results (e.g., plume outlines) is addressed. Second, a streamlined process for optimizing well placement in the simulation model is given. Presented at the 2014 Rocky Mountain Section AAPG Annual Meeting.
Facebook
TwitterThe Method of Uncertainty Minimization using Polynomial Chaos Expansions (MUM-PCE) was developed as a software tool to constrain physical models against experimental measurements. These models contain parameters that cannot be easily determined from first principles and so must be measured, and some which cannot even be easily measured. In such cases, the models are validated and tuned against a set of global experiments which may depend on the underlying physical parameters in a complex way. The measurement uncertainty will affect the uncertainty in the parameter values.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource serves as a template for creating a curve number grid raster file which could be used to create corresponding maps or for further utilization, soil data and reclassified land-use raster files are created along the process, user has to provided or connect to a set of shape-files including boundary of watershed, soil data and land-use containing this watershed, land-use reclassification and curve number look up table. Script contained in this resource mainly uses PyQGIS through Jupyter Notebook for majority of the processing with a touch of Pandas for data manipulation. Detailed description of procedure are commented in the script.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Machine learning is a set of tools that are increasingly used in the field of chemistry. The introduction of potential uses of machine learning to undergraduate chemistry students should help to increase their comprehension of and interest in machine learning processes and can help support them in their transition into graduate research and industrial environments that use such tools. Herein we present an exercise aimed at introducing machine learning alongside improving students’ general Python coding abilities. The exercise aims to identify the regioisomerism of disubstituted benzene systems solely from infrared spectra, a simple and ubiquitous undergraduate technique. The exercise culminates in students collecting their own spectra of compounds with unknown regioisomerism and predicting the results, allowing them to take ownership of their results and creating a larger database of information to draw upon for machine learning in the future.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3. An archived version of Scoria, derived from the main Scoria branch, that includes MDAnalysis support.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.
This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.
This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.
Facebook
TwitterThis dataset was created by Monis Ahmad