100+ datasets found

Datasets for manuscript "A data engineering framework for chemical flow...
catalog.data.gov
gimi9.com
Updated Nov 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
Explore at:
Dataset updated
Nov 7, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).
w
Dataset of book subjects that contain Data structures using Python
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Data structures using Python [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Data+structures+using+Python&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 2 rows and is filtered where the books is Data structures using Python. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
H
Using Python Packages and HydroShare to Advance Open Data Science and...
hydroshare.org
beta.hydroshare.org
zip
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water [Dataset]. https://www.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb
Explore at:
zip(31.0 MB)Available download formats
Dataset updated
Sep 28, 2023
Dataset provided by
HydroShare
Authors
Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.
o
Scientific Data Analysis and Visualization with Python
explore.openaire.eu
Updated Feb 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Jalal Uddin; Nishat Rayhana Eshita; Md. Asif Newaz; Naiem Sheikh; Afifa Talukder; Aysha Akter; Md. Habibur Rahman; Md. Babul Miah (2022). Scientific Data Analysis and Visualization with Python [Dataset]. http://doi.org/10.5281/zenodo.5944707
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5944707
Dataset updated
Feb 2, 2022
Authors
Md. Jalal Uddin; Nishat Rayhana Eshita; Md. Asif Newaz; Naiem Sheikh; Afifa Talukder; Aysha Akter; Md. Habibur Rahman; Md. Babul Miah
Description
The publication "Scientific Data Analysis and Visualisation with Python" delves into various facets of Python programming, with a special focus on data analysis and visualisation. Let us deconstruct the main sections: Examining operators and expressions: The text explores arithmetic, comparison, logic, bitwise, assignment and membership operators. These operators serve as fundamental components in the construction of any Python script. Illustrative real-world scenarios show the practical applications of these operators. For example, arithmetic operators are essential for performing mathematical calculations, while comparison operators facilitate decision-making processes. Discussion of data structures and control flow: The book discusses procedures for input, handling strings, working with lists, dictionaries, loops, and conditional expressions. Scientists and software developers can learn how to manipulate data structures efficiently. In particular, lists and dictionaries play a crucial role in organising and retrieving data. Insight into functions and modularisation: Functions are central to Python programming. The publication offers valuable perspectives on the creation and use of functions. The process of modularisation increases the reusability and maintainability of code. By breaking down complex tasks into smaller functions, developers can improve the understandability of their code. Exploring data with Pandas: The book presents a detailed examination of Pandas, a robust library. Readers will gain skills in loading, manipulating, and analysing data frames. Explain data presentation and visualisation: Effective visualisation is critical to understanding data. The publication introduces matplotlib and other plotting libraries. Scientific researchers and analysts can create powerful visual representations to effectively communicate insights. In summary, this publication serves as a valuable resource for individuals at various levels of Python proficiency, including beginners and experienced users. Whether you are a scientist navigating through data or a developer honing your skills, the comprehensive content in this book will guide you towards mastering Python data analysis and visualisation. The training materials are provided for international learners. However, the following lectures on Python are available on YouTube for both international and Bangladeshi learners. For international learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ9ci8DAhpizHGQ7IsCZFsKu For Bangladeshi learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ_byYGwq3FyGhDOFRNdHRL8 My profile: https://researchsociety20.org/founder-and-director/
Vector datasets for workshop "Introduction to Geospatial Raster and Vector...
figshare.com
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Avery (2022). Vector datasets for workshop "Introduction to Geospatial Raster and Vector Data with Python" [Dataset]. http://doi.org/10.6084/m9.figshare.21273837.v1
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21273837.v1
Dataset updated
Oct 5, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ryan Avery
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cadaster data from PDOK used to illustrate the use of geopandas and shapely, geospatial python packages for manipulating vector data. The brpgewaspercelen_definitief_2020.gpkg file has been subsetted in order to make the download manageable for workshops. Other datasets are copies of those available from PDOK.
Data from: Python Scripting for ArcGIS Pro
dados-edu-pt.hub.arcgis.com
Updated Aug 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri Portugal - Educação (2020). Python Scripting for ArcGIS Pro [Dataset]. https://dados-edu-pt.hub.arcgis.com/datasets/python-scripting-for-arcgis-pro
Explore at:
Dataset updated
Aug 14, 2020
Dataset provided by
Esrihttp://esri.com/
Authors
Esri Portugal - Educação
Description
Python Scripting for ArcGIS Pro stars with the fundamentals of Python programming and then dives into how to write useful Python scripts that work with spatial data in ArcGIS Pro. Leam how to execute geoprocessing tools, describe, create and update data, as well as execute a number of specialized tasks. See how to write simple, Custom scripts that will automate your ArcGIS Pro workflows.Some of the key topics you Will learn include:Python fundamentalsSetting up a Python editorAutomating geoprocessing tasksExploring and manipulating spatal and tabular dataWorking With geometriesMap scriptingDebugging ard error handlingHelpful "points to remember," key terms, and review questions are included at the end of each chapter to reinforce your understanding of Python. Corresponding data and exercises are available online.Whether want to learn python or already have some experience, Python Scripting for ArcGlS Pro is comprehensive, hands-on book for learning versatility of Python coding as an approach to solving problems and increasing your productivity in ArcGlS Pro. Follow the step-by-step instruction and common workflow guidance for automating tasks and scripting with Python.Don't forget to also check out Esri Press's other Python title:Advanced Python Scripting for ArcGIS ProAUDIENCEProfessional and scholarly. College/higher education. General/trade.AUTHOR BIOPaul A Zandbergen is an associate professor of geography at the University of New Mexico in Albuquerque. His areas of expertise include geographic information science; spatial and statistical analysis techniques using GIS; error and uncertainty in spatial data; GIS applications in criminology, economics, health, and spatial ecology; terrain analysis and modeling; and community-based mapping using GIS and GPS.Pub Date: Print 7/7/2020 Digital: 7/7/2020ISBN: Print 9781589484993 Digital: 9781589485006 Price: Print: $79.99 USD Digital: $79.99 USD Pages: 420 Trim: 8 x 10 in.Table of ContentsPrefaceAcknowledgmentsChapter 1. Introducing Py%onChapter 2. Working with Python editorsChapter 3. Geoprocessing in ArcGIS ProChapter 4. Leaming Python language fundamentalsChapter 5. Geoprocessing using PythonChapter 6. Exploring spatial dataChapter 7. Debugging and error handlingChapter 8. Manipulating spatial and tabular dataChapter 9. Working with geometriesChapter 10. Working with rastersChapter 11. Map scriptingIndexPython Scripting and Advanced Python Scripting for ArcGIS Pro | Official Trailer | 2020-07-12 | 01:04Paul Zandbergen | Interview with Esri Press | 2020-07-10 | 25:37 | Link.
Learn Data Science Series Part 1
kaggle.com
Updated Dec 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rupesh Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

Chapter 1: Getting started with pandas

Chapter 2: Analysis: Bringing it all together and making decisions

Chapter 3: Appending to DataFrame

Chapter 4: Boolean indexing of dataframes

Chapter 5: Categorical data

Chapter 6: Computational Tools

Chapter 7: Creating DataFrames

Chapter 8: Cross sections of different axes with MultiIndex

Chapter 9: Data Types

Chapter 10: Dealing with categorical variables

Chapter 11: Duplicated data

Chapter 12: Getting information about DataFrames

Chapter 13: Gotchas of pandas

Chapter 14: Graphs and Visualizations

Chapter 15: Grouping Data

Chapter 16: Grouping Time Series Data

Chapter 17: Holiday Calendars

Chapter 18: Indexing and selecting data

Chapter 19: IO for Google BigQuery

Chapter 20: JSON

Chapter 21: Making Pandas Play Nice With Native Python Datatypes

Chapter 22: Map Values

Chapter 23: Merge, join, and concatenate

Chapter 24: Meta: Documentation Guidelines

Chapter 25: Missing Data

Chapter 26: MultiIndex

Chapter 27: Pandas Datareader

Chapter 28: Pandas IO tools (reading and saving data sets)

Chapter 29: pd.DataFrame.apply

Chapter 30: Read MySQL to DataFrame

Chapter 31: Read SQL Server to Dataframe

Chapter 32: Reading files into pandas DataFrame

Chapter 33: Resampling

Chapter 34: Reshaping and pivoting

Chapter 35: Save pandas dataframe to a csv file

Chapter 36: Series

Chapter 37: Shifting and Lagging Data

Chapter 38: Simple manipulation of DataFrames

Chapter 39: String manipulation

Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame

Chapter 41: Working with Time Series
MyAnimeList scraping decades of anime
kaggle.com
zip
Updated Apr 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gumpy-Q (2021). MyAnimeList scraping decades of anime [Dataset]. https://www.kaggle.com/crazygump/myanimelist-scrappind-a-decade-of-anime
Explore at:
zip(196403 bytes)Available download formats
Dataset updated
Apr 8, 2021
Authors
Gumpy-Q
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Fairly new to programming, I was trying to get my first project down by scraping the season pages from MyAnimeList.

Content

Using Python with BeautifulSoup as scraping modules, I collect information from the anime in a range of seasons I specified. https://myanimelist.net/anime/season I only use information available on this page so you will not find the extensive information some can give you in other Kaggle DataSet.

You can find the code here : https://github.com/Gumpy-Q/MALscrapper
o
Population Distribution Workflow using Census API in Jupyter Notebook:...
openicpsr.org
delimited
Updated Jul 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda (2020). Population Distribution Workflow using Census API in Jupyter Notebook: Dynamic Map of Census Tracts in Boone County, KY, 2000 [Dataset]. http://doi.org/10.3886/E120382V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E120382V1
Dataset updated
Jul 23, 2020
Dataset provided by
Texas A&M University
Authors
Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2000
Area covered
Boone County, Kentucky
Description
This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
U
Python code used to download gridMET climate data for public-supply water...
data.usgs.gov
catalog.data.gov
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carol Luukkonen; Ayman Alzraiee; Joshua Larsen; Donald Martin; Deidre Herbert; Cheryl Buchwald; Natalie Houston; Kristen Valseth; Scott Paulinski; Lisa Miller; Richard Niswonger; Jana Stewart; Cheryl Dieter (2024). Python code used to download gridMET climate data for public-supply water service areas [Dataset]. http://doi.org/10.5066/P9FUL880
Explore at:
Unique identifier
https://doi.org/10.5066/P9FUL880
Dataset updated
Jan 5, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Carol Luukkonen; Ayman Alzraiee; Joshua Larsen; Donald Martin; Deidre Herbert; Cheryl Buchwald; Natalie Houston; Kristen Valseth; Scott Paulinski; Lisa Miller; Richard Niswonger; Jana Stewart; Cheryl Dieter
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jan 1, 2000 - Dec 31, 2020
Description
This child item describes Python code used to retrieve gridMET climate data for a specific area and time period. Climate data were retrieved for public-supply water service areas, but the climate data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the climate data collector code were used as input feature variables in the public supply delivery and water use machine learning models. This page includes the following file: climate_data_collector.zip - a zip file containing the climate data collector Python code used to retrieve climate data and a README file.
Z
Storage and Transit Time Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
Explore at:
Dataset updated
Jun 12, 2024
Dataset authored and provided by
Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. FeltonDate: 5/5/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably in this project.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a particular function:

01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
ISRO Geodata Processing using Python & ML
kaggle.com
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farha Kousar (2025). ISRO Geodata Processing using Python & ML [Dataset]. https://www.kaggle.com/datasets/farhakouser/isro-geodata-processing-using-python-and-ml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 8, 2025
Dataset provided by
Kaggle
Authors
Farha Kousar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains geospatial water body data extracted from ISRO sources, processed using Python & Machine Learning techniques. It includes samples from the Vizag region and a training dataset for analysis.
Restaurants at New Port Beach, California
kaggle.com
zip
Updated Jan 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
_itzdvine (2021). Restaurants at New Port Beach, California [Dataset]. https://www.kaggle.com/anidimma/restaurants-at-new-port-beach-california
Explore at:
zip(5167 bytes)Available download formats
Dataset updated
Jan 10, 2021
Authors
_itzdvine
Area covered
California, Newport Beach
Description
Dataset

This dataset was created by _itzdvine

Contents
d
mumpcepy: A Python implementation of the Method of Uncertainty Minimization...
datasets.ai
catalog.data.gov
0
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). mumpcepy: A Python implementation of the Method of Uncertainty Minimization using Polynomial Chaos Expansions [Dataset]. https://datasets.ai/datasets/mumpcepy-a-python-implementation-of-the-method-of-uncertainty-minimization-using-polynomia-c2fc3
Explore at:
0Available download formats
Dataset updated
Aug 6, 2024
Dataset authored and provided by
National Institute of Standards and Technology
Description
The Method of Uncertainty Minimization using Polynomial Chaos Expansions (MUM-PCE) was developed as a software tool to constrain physical models against experimental measurements. These models contain parameters that cannot be easily determined from first principles and so must be measured, and some which cannot even be easily measured. In such cases, the models are validated and tuned against a set of global experiments which may depend on the underlying physical parameters in a complex way. The measurement uncertainty will affect the uncertainty in the parameter values.
d
Python code used to download U.S. Census Bureau data for public-supply water...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Python code used to download U.S. Census Bureau data for public-supply water service areas [Dataset]. https://catalog.data.gov/dataset/python-code-used-to-download-u-s-census-bureau-data-for-public-supply-water-service-areas
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
This child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.
d
(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...
search.dataone.org
hydroshare.org
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Explore at:
Unique identifier
https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Dataset updated
Oct 19, 2024
Dataset provided by
Hydroshare
Authors
Young-Don Choi
Description
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
C
CatCrops_identification: A Python Project for Early Crop Type Classification...
dataverse.csuc.cat
txt, zip
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordi Gené-Mola; Jordi Gené-Mola; Magí Pàmies Sans; Magí Pàmies Sans; César Minuesa; César Minuesa; Jaume Casadesus; Jaume Casadesus; Joaquim Bellvert; Joaquim Bellvert (2025). CatCrops_identification: A Python Project for Early Crop Type Classification Using Remote Sensing and Ancillary Data [Dataset]. http://doi.org/10.34810/data2322
Explore at:
zip(1258857), txt(21471)Available download formats
Unique identifier
https://doi.org/10.34810/data2322
Dataset updated
Jun 6, 2025
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Jordi Gené-Mola; Jordi Gené-Mola; Magí Pàmies Sans; Magí Pàmies Sans; César Minuesa; César Minuesa; Jaume Casadesus; Jaume Casadesus; Joaquim Bellvert; Joaquim Bellvert
License
https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data2322https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data2322
Dataset funded by
Agencia Estatal de Investigació
European Commission
Agència per la Competitivitat de l’Empresa (ACCIÓ)
Description
CatCrops_identification is a Python library developed for the early classification of crop types using remote sensing data (Sentinel-2) and ancillary information. It is based on a Transformer model adapted for the analysis of spectral time series with variable length, and it allows the integration of auxiliary data such as the previous year’s crop, irrigation system, cloud cover, elevation, and other geographic features. The library provides tools to download and prepare datasets, train deep learning models, and generate vector maps with plot-level classification. CatCrops_identification includes scripts to automate the entire workflow and offers a public dataset that combines declared and inspected information on crop types in the Lleida region. This approach improves classification accuracy in the early stages of the agricultural season, offering a robust and efficient tool for agricultural planning and water resource management.
Storage and Transit Time Data and Code
zenodo.org
zip
Updated Oct 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14009758
Dataset updated
Oct 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Felton; Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. Felton
Date: 10/29/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

#Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a role:

"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.

"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.

"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.

"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
d
Python and R Basics for Environmental Data Sciences
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Tao Wen
Area covered

Description
This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.
t
Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...
researchdata.tuwien.at
html, pdf, zip
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi (2025). Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment [Dataset]. http://doi.org/10.48436/m2ha4-t1v92
Explore at:
html, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.48436/m2ha4-t1v92
Dataset updated
Mar 19, 2025
Dataset provided by
TU Wien
Authors
Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
How To Cite?

Alinaghi, N., Giannopoulos, I., Kattenbeck, M., & Raubal, M. (2025). Decoding wayfinding: analyzing wayfinding processes in the outdoor environment. International Journal of Geographical Information Science, 1–31. https://doi.org/10.1080/13658816.2025.2473599

Link to the paper: https://www.tandfonline.com/doi/full/10.1080/13658816.2025.2473599

Folder Structure

The folder named “submission” contains the following:

“pythonProject”: This folder contains all the Python files and subfolders needed for analysis.

ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

Setting Up the Environment

Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.

The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

Subfolders

1. Data_4_IJGIS

This folder contains the data used for the results reported in the paper.

Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

2. results_[DateTime] (e.g., results_20240906_15_00_13)

This folder will be generated when you run the code and will store the output of each step.

The current folder contains results created during code debugging for the submission.

When you run the code, a new folder with fresh results will be generated.

Python Files

1. helper_functions.py

Contains reusable functions used throughout the analysis.

Each function includes a description of its purpose and the input parameters required.

2. create_sanity_plots.py

Generates scatter plots like those in Figure 3 of the paper.

Although the code has been run for all 309 trials, it can be used to check the sample data provided.

Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.

Usage: Run this file to create visualizations similar to Figure 3.

3. overlapping_sliding_window_loop.py

Implements overlapping sliding window segmentation and generates plots like those in Figure 4.

Output:

Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.

Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.

A visualization of the segments, similar to Figure 4, will be automatically generated.

4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

These files compute features as explained in Tables 1 and 2 of the paper, respectively.

They process the segmented recordings generated by the overlapping_sliding_window_loop.py.

Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

5. training_prediction.py

This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:

a. Data Preparation (corresponding to Section 5.1.1 of the paper)

Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.

A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.

b. Training/Validation/Test Split

Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).

Make sure that you follow the instructions in the comments to the code exactly.

Output: The split data is saved as .csv files in the results folder.

c. Machine and Deep Learning Experiments

This part contains three main code blocks:

iii. One for the XGboost code with correct hyperparameter tuning:
Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.

XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.

XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

d. Inference (Monitoring Part)

Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.

Figure 8 in the paper is generated using this part of the code.

6. sequence_analysis.py

Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.

This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

Licenses

The data is licensed under CC-BY, the code is licensed under MIT.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr

Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations"

Explore at:

Dataset updated

Nov 7, 2021

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).

Clear search

Close search

Google apps

Main menu

Datasets for manuscript "A data engineering framework for chemical flow...

Dataset of book subjects that contain Data structures using Python

Using Python Packages and HydroShare to Advance Open Data Science and...

Scientific Data Analysis and Visualization with Python

Vector datasets for workshop "Introduction to Geospatial Raster and Vector...

Data from: Python Scripting for ArcGIS Pro

Learn Data Science Series Part 1

Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

MyAnimeList scraping decades of anime

Context

Content

Population Distribution Workflow using Census API in Jupyter Notebook:...

Python code used to download gridMET climate data for public-supply water...

Storage and Transit Time Data and Code

Code information

ISRO Geodata Processing using Python & ML

Restaurants at New Port Beach, California

Dataset

Contents

mumpcepy: A Python implementation of the Method of Uncertainty Minimization...

Python code used to download U.S. Census Bureau data for public-supply water...

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

CatCrops_identification: A Python Project for Early Crop Type Classification...

Storage and Transit Time Data and Code

Python and R Basics for Environmental Data Sciences

Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...

How To Cite?

Folder Structure

Setting Up the Environment

Subfolders

1. Data_4_IJGIS

2. results_[DateTime] (e.g., results_20240906_15_00_13)

Python Files

1. helper_functions.py

2. create_sanity_plots.py

3. overlapping_sliding_window_loop.py

4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

5. training_prediction.py

a. Data Preparation (corresponding to Section 5.1.1 of the paper)

b. Training/Validation/Test Split

c. Machine and Deep Learning Experiments

d. Inference (Monitoring Part)

6. sequence_analysis.py

Licenses

Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations"