100+ datasets found

v
Data from: The environmental footprint of data centers in the United States
data.lib.vt.edu
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Abu Bakar Siddik; Arman Shehabi; Landon Marston (2023). The environmental footprint of data centers in the United States [Dataset]. http://doi.org/10.7294/14504913.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.7294/14504913.v2
Dataset updated
May 30, 2023
Dataset provided by
University Libraries, Virginia Tech
Authors
Md Abu Bakar Siddik; Arman Shehabi; Landon Marston
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
Much of the world’s data are stored, managed, and distributed by data centers. Data centers re-quire a tremendous amount of energy to operate, accounting for around 1.8% of electricity use in the United States. Large amounts of water are also required to operate data centers, both directly for liquid cooling and indirectly to produce electricity. For the first time, we calculate spatially-detailed carbon and water footprints of data centers operating within the United States, which is home to around one-quarter of all data center servers globally. Our bottom-up approach reveals one-fifth of data center servers direct water footprint comes from moderately to highly water stressed watersheds, while nearly half of servers are fully or partially powered by power plants located within water stressed regions. Approximately 0.5% of total US greenhouse gas emissions are attributed to data centers. We investigate tradeoffs and synergies between data center’s water and energy utilization by strategically locating data centers in areas of the country that will minimize one or more environmental footprints. Our study quantifies the environmental implications behind our data creation and storage and shows a path to decrease the environmental footprint of our increasing digital footprint..
Data from: A dataset to model Levantine landcover and land-use change...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10396148
Dataset updated
Dec 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Kempf; Michael Kempf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 16, 2023
Area covered
Levant
Description
Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R

This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

2_MERGE_MODIS_tiles.R

In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

3_CROP_MODIS_merged_tiles.R

Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.

4_TREND_analysis_NDVI.R

Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

5_BUILT_UP_change_raster.R

Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

6_POPULATION_numbers_plot.R

For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

7_YIELD_plot.R

In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

8_GLDAS_read_extract_trend

The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
Z
Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miha Mohorčič; Aleš Simončič; Mihael Mohorčič; Andrej Hrovat (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
Explore at:
Dataset updated
Jan 6, 2023
Authors
Miha Mohorčič; Aleš Simončič; Mihael Mohorčič; Andrej Hrovat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

Related dataset

Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

Measurement setup

The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

Data preprocessing

The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.

When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

where PR_data is structured as follows:

{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

Folder structure

For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

Environments description

The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

Known dataset shortcomings

Due to technical and physical limitations, the dataset contains some identified deficiencies.

PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

Location 1 - Piazza del Duomo - Chierici

The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

Location 2 - Via Etnea - Piazza del Duomo

The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

Location 3 - Via Etnea - Piazza Università

Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

Location 4 - Piazza Università

This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

Recognitions

The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
E
Data from: Genomic prediction of marker × environment interaction Kernel...
data.moa.gov.et
html
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CIMMYT Ethiopia (2025). Genomic prediction of marker × environment interaction Kernel regression models [Dataset]. https://data.moa.gov.et/dataset/hdl-11529-10580
Explore at:
htmlAvailable download formats
Dataset updated
Jan 20, 2025
Dataset provided by
CIMMYT Ethiopia
Description
The marker × environment interaction (M×E) decomposes the marker effects into main effects and interaction environmental-specific effects. The M×E interaction may be modeled through a linear kernel (Genomic Best Linear Unbiased Predictor, GBLUP) or with non-linear Gaussian kernels. In this paper we proposed to use two non-linear Gaussian kernels, one is the Reproducing Kernel Hilbert Space with Kernel Averaging (RKHS KA) and the other is the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (GKb). The three methods (GBLUP, RKHS KA, and GKb) were used to model single-environment and were extended to account for the M×E interaction (GBLUP-ME, RKHS KA-ME and GKb-ME) in wheat and maize data sets. Prediction accuracy was assessed by a c cross validation scheme that predicts the performance of lines in environments where the lines were not observed. For the single-environment analyses of wheat and maize data sets, GKb and RKHS KA had higher prediction accuracy than the GBLUP for all environments. For wheat data set, RKHS KA-ME and GKb-ME model did not show any advantage over the single-environment model for pair of environments with zero or negative correlations but up to 68% superiority for pairs of environments with positive correlation. For wheat data, the M×E interaction models with Gaussian kernels had accuracies up to 17% over that of the GBLUP-ME. Prediction accuracy of GKb-ME and RKHS KA-ME were up to 12% higher than those of the GBLUP-ME. The superiority of the Gaussian kernel models over the linear kernel couple with the M×E model is due to a more flexible kernels that allow to account for more complex small marker main effects and marker specific interaction effects.
Open University Learning Analytics Dataset
kaggle.com
zip
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Open University Learning Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-university-learning-analytics-dataset
Explore at:
zip(44203263 bytes)Available download formats
Dataset updated
Dec 21, 2023
Authors
The Devastator
Description
Open University Learning Analytics Dataset

Student Performance and Engagement Data at The Open University

By UCI [source]

About this dataset

This dataset provides an intimate look into student performance and engagement. It grants researchers access to numerous salient metrics of academic performance which illuminate a broad spectrum of student behaviors: how students interact with online learning material; quantitative indicators reflecting their academic outcomes; as well as demographic data such as age group, gender, prior education level among others.

The main objective of this dataset is to enable analysts and educators alike with empirical insights underpinning individualized learning experiences - specifically in identifying cases when students may be 'at risk'. Given that preventive early interventions have been shown to significantly mitigate chances of course or program withdrawal among struggling students - having accurate predictive measures such as this can greatly steer pedagogical strategies towards being more success oriented.

One unique feature about this dataset is its intricate detailing. Not only does it provide overarching summaries on a per-student basis for each presented courses but it also furnishes data related to assessments (scores & submission dates) along with information on individuals' interactions within VLEs (virtual learning environments) - spanning different types like forums, content pages etc... Such comprehensive collation across multiple contextual layers helps paint an encompassing portrayal of student experience that can guide better instructional design.

Due credit must be given when utilizing this database for research purposes through citation. Specifically referencing (Kuzilek et al., 2015) OU Analyse: Analysing At-Risk Students at The Open University published in Learning Analytics Review is required due to its seminal work related groundings regarding analysis methodologies stem from there.

Immaterial aspects aside - it is important to note that protection of student privacy is paramount within this dataset's terms and conditions. Stringent anonymization techniques have been implemented across sensitive variables - while detailed, profiles can't be traced back to original respondents.

How to use the dataset

How To Use This Dataset:

Understanding Your Objectives: Ideal objectives for using this dataset could be to identify at-risk students before they drop out of a class or program, improving course design by analyzing how assignments contribute to final grades, or simply examining relationships between different variables and student performance.

Set up your Analytical Environment: Before starting any analysis make sure you have an analytical environment set up where you can load the CSV files included in this dataset. You can use Python notebooks (Jupyter), R Studio or Tableau based software in case you want visual representation as well.

Explore Data Individually: There are seven separate datasets available: Assessments; Courses; Student Assessment; Student Info; Vle (Virtual Learning Environment); Student Registeration and Student Vle. Load these CSVs separately into your environment and do an initial exploration of each one: find out what kind of data they contain (numerical/categorical), if they have missing values etc.

Merge Datasets As the core idea is to track a student’s journey through multiple courses over time, combining these datasets will provide insights from wider perspectives. One way could be merging them using common key columns such as 'code_module', 'code_presentation', & 'id_student'. But make sure that merge should depend on what question you're trying to answer.

Identify Key Metrics Your key metrics will depend on your objectives but might include: overall grade averages per course or assessment type/student/region/gender/age group etc., number of clicks in virtual learning environment, student registration status etc.

Run Your Analysis Now you can run queries to analyze the data relevant to your objectives. Try questions like: What factors most strongly predict whether a student will fail an assessment? or How does course difficulty or the number of allotments per week change students' scores?

Visualization: Visualizing your data can be crucial for understanding patterns and relationships between variables. Use graphs like bar plots, heatmaps, and histograms to represent different aspects of your analyses.

Actionable Insights: The final step is interpreting these results in ways that are meaningf...
n
In situ chemical oxidisation (ICO) of petroleum hydrocarbons at old Casey...
access.earthdata.nasa.gov
researchdata.edu.au
+1more
Updated Aug 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). In situ chemical oxidisation (ICO) of petroleum hydrocarbons at old Casey Station [Dataset]. http://doi.org/10.26179/5b7ce32da07bb
Explore at:
Unique identifier
https://doi.org/10.26179/5b7ce32da07bb
Dataset updated
Aug 22, 2018
Time period covered
Dec 1, 1999 - Dec 28, 2001
Area covered

Description
In-situ chemical oxidation (ICO) is a remediation technology that involves the addition of chemicals to the substrate that degrade contaminants through oxidation processes. This series of field experiments conducted at the Old Casey Powerhouse/Workshop investigate the potential for the use of ICO technology in Antarctica on petroleum hydrocarbon contaminated sediments.

Surface application was made using 12.5% sodium hyperchlorite, 6.25% sodium hydrechlorite, 30% hydrogen peroxide and Fentons Reagent (sodium hypchlorite with an iron catalyst) on five separate areas of petroleum hydrocarbon contaminated sediments. Sampling was conducted before and after chemical application from the top soil section (0 - 5 cm) and at depth (10 - 15 cm).

The data are stored in an excel file.

This work was completed as part of ASAC project 1163 (ASAC_1163).

The spreadsheet is divided up as follows:

The first 51 sheets are the raw GC-FID data for the 99/00 field season, labelled by sample name. These sheets use the same format as the radiometric GC-FID spreadsheet in the metadata record entitled 'Mineralisation results using 14C octadecane at a range of temperatures'. Sample name format consists of a location or experiment indicator (CW=Casey Workshop, BR= Small-scale field trial), the year the sample was collected (00=2000), the sample type (S=Soil) and a sequence number.

SUMMARY and PRINTABLE VERSION are the same data in different formats, PRINTABLE VERSION is printer friendly. This summary data includes the hydrocarbon concentrations corrected for dry weight of soil and biodegradation and weathering indices.

GRAPHS are graphs.

FIELD MEASUREMENTS show the results of the measurements taken in the field and include PID (ppm), Soil temperature (C), Air temperature (C), Ph and MC (moisture content) (%).

NOTES shows the chemicals added to each trial, and a short summary of the samples.

The next 21 sheets show the raw GC-FID data for the 00/01 field season, labelled to previously explained method. PRINTABLE (0001) is a summary of the raw GC-FID data.

The next 3 sheets show the raw GC-FID data for the 01/02 field season, labelled to previously explained method. PRINTABLE (0102) is a summary of the raw GC-FID data.

MPN-NOTES shows lab book references and set up summary for the Most Probable Number (MPN) analysis.

MPN-DETAILS shows the set up details, calculations and results for each MPN analysis.

MPN-RESULTS shows the raw MPN data.

MPN-Calculations show the results from the MPN Calculator.

The fields in the dataset are: Retention Time Area % Area Height of peak Amount Int Type Units Peak Type Codes
O
Hazardous Waste Manifest Data (CT) 1984 – 2008
data.ct.gov
datasets.ai
+2more
csv, xlsx, xml
Updated Nov 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Energy and Environmental Protection (2025). Hazardous Waste Manifest Data (CT) 1984 – 2008 [Dataset]. https://data.ct.gov/widgets/h6d8-qiar
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Nov 16, 2025
Dataset authored and provided by
Department of Energy and Environmental Protection
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
Connecticut
Description
NOTES:
1. Please use this link to leave the data view to see the full description: https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Manifest-Data-CT-1984-2008/h6d8-qiar 2. Please Use ALL CAPS when searching using the "Filter" function on text such as: LITCHFIELD. But not needed for the upper right corner "Find in this Dataset" search where for example "Litchfield" can be used.

Dataset Description: We know there are errors in the data although we strive to minimize them. Examples include: • Manifests completed incorrectly by the generator or the transporter - data was entered based on the incorrect information. We can only enter the information we receive. • Data entry errors – we now have QA/QC procedures in place to prevent or catch and fix a lot of these. • Historically there are multiple records of the same generator. Each variation in spelling in name or address generated a separate handler record. We have worked to minimize these but many remain. The good news is that as long as they all have the same EPA ID they will all show up in your search results. • Handlers provide erroneous data to obtain an EPA ID - data entry was based on erroneous information. Examples include incorrect or bogus addresses and names. There are also a lot of MISSPELLED NAMES AND ADDRESSES! • Missing manifests – Not every required manifest gets submitted to DEEP. Also, of the more than 100,000 paper manifests we receive each year, some were incorrectly handled and never entered. • Missing data – we know that the records for approximately 25 boxes of manifests, mostly prior to 1985 were lost from the database in the 1980’s. • Translation errors – the data has been migrated to newer data platforms numerous times, and each time there have been errors and data losses. • Wastes incorrectly entered – mostly due to complex names that were difficult to spell, or typos in quantities or units of measure.

Since Summer 2019, scanned images of manifest hardcopies may be viewed at the DEEP Document Online Search Portal: https://filings.deep.ct.gov/DEEPDocumentSearchPortal/
Mental Health Support Feature Analysis
kaggle.com
zip
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mental Health Support Feature Analysis [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-support-feature-analysis
Explore at:
zip(961023031 bytes)Available download formats
Dataset updated
Jan 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mental Health Support Feature Analysis

Correlating Text Features and Mental Health Indicators

By [source]

About this dataset

This dataset is an invaluable source of information for exploring the psychological and linguistic features of mental health support discussions conducted on Reddit in 2019. The data consists of text from posts extracted from a variety of subreddits, as well as over 256 features that may provide insight into the psychological and linguistic characteristics within these conversations.

These included indicators measure such things as Automated Readability Index, Coleman-Liau Index, Flesch Reading Ease, Gunning Fog Index and Lix scores; Wiener Sachtextformel calculations; TF-IDF analyses related to key topics like abuse, alcohol use, anxiety, depression symptoms, family matters and more. Furthermore, values are also provided for metrics like words and syllables per sentence; total characters present in each post; total phrases or sentences contained per submission; numbers of long/monosyllable/polysyllable words used throughout each contribution.

Sentiment analysis is another useful measurement made available within this dataset - values can be graphed against aspects such as negativity or neutrality versus positivity across all posts discussing various ideas related to economic stressors or isolation experiences - all alongside scores related to specific issues like substance use frequency or gun control debates. Additionally this dataset offers valuable metrics concerning punctuation tendencies encountered in these types of conversations - often associated with syntax brought forward by personal pronouns in the first person (I); second person (you) ; third person (him/her/they). Furthermore score information has been pulled around achievement language used; adverb presence detected throughout post histories etc., helping pave the way for detailed discourse analyses surrounding affective processes, anxieties mentioned within discussions on religious topics – even sadness levels expressed through discourse exchanges between people seeking mutual relationship advice!

In addition to providing a wealth of measures produced from texts associated with all kinds mental health conversations found online – this dataset could prove extremely important when conducting further research designed to better profile certain populations emotionally impacted by their individual digital footprints!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Using this dataset, you will be able to analyze various psychological and linguistic features in mental health support online discussions, in order to identify high-risk behaviors. The data consists of text from posts, as well as over 256 different indicators of psychological and linguistic features.

To get started, you will need to set up your own local environment with the necessary packages for running the dataset. You can find this information on the Kaggle page for this dataset. Once you have all that set up, you'll be able to dive into exploring the available data!

The first step is to take a look at each column header and gain an understanding of what each feature measures. This dataset contains features such as Automated Readability Index (ARI), Coleman-Liau Index (CLI), Flesch Reading Ease (FREEDOM), Gunning Fog Index (GFI), Lix, Wiener Sachtextformel, sentiment scores such as sentiment negative (SENT_NEG) and sentiment compound (SENT_COMPOUND). And textual features including TF-IDF analysis of words related to topics such as abuse, alcohol, anxiety depression family fearing medication problem stress suicide etc., are provided too in order for us use it accordingly with our purpose/project..

Using these features collected from mental health support discussions on Reddit between 2019 and 2020 on various topics related to mental health states such us abuse substance use economic issues social isolation etc., can help us better identify dangerous risk behaviours among those people who discussing their problems online Hence get a deeper understanding of online behaviorat risk state by studying certain patterns or trends beyond their text contents so intelligence agenciesetc could benefitfrom it when monitoring suspicious situations..from one side it provides them a unique toolkitfor identifying certain high-risk behaviorsfrom another side if provides many opportunitiesfor criminal justice authorities aimingto detect whenever someone discussing illegal activitiesonline like drug dealing weapons exchangeetc…so they would be readily catchit while digging deep...
Owl Behavioral Analysis
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Owl Behavioral Analysis [Dataset]. https://www.kaggle.com/datasets/thedevastator/owl-behavioral-analysis-by-shad-reynolds
Explore at:
zip(191838 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
The Devastator
Description
Owl Behavioral Analysis

Assessing Nocturnal Activity in Owls

By Shad Reynolds [source]

About this dataset

This dataset offers a detailed glimpse into the behavior of a Barn Owl, Sparkles, over the course of 3 months. Analyzing this data provides important insights into how Barn Owls interact with their environment and how they exhibit their behaviors in controlled settings. The data consists of 59 observations collected over 87 days between August 2020 to October 2020 and covers various aspects including vocalizations, mobility, food intake, preening activities etc. Each observation contains information on different aspects related to Sparkle's behavior such as head tilts per minute (HPTPM), location changes per minute (LCPM), secondary feather movement (SFM) etc., giving us an idea about her overall activity levels and interactions with her environment. With these insights we can observe the trends in Sparkle’s behaviour that can help us out when it comes to helping other birds in similar situations or understand what other animals might be going through as well

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains detailed observations of the behavior of Sparkle, an adult barn owl (Tyto alba) housed at the facilities of Washington State University’s Center for Neurotechnology. The data was collected as part of a research study to examine how owls may use their long-term memory to plan ahead when making hunting decisions, and ultimately to reveal clues about how adaptive decision making works.

The dataset has 20 columns with different readings from the environment taken over 3 days: date, time (in 24 hour format), activity level (based on Sparkle's movement), air temperature, luminance or brightness in Lux, moisture levels in percentage (%) points from 0% relative humidity to 100%, and 14 ethograms consisting of wing flapping/flipping behaviors expressed by Sparkle.

How to use this dataset

The data contained inside this dataset offers insights into not only owl behavior but potentially sheds light on decision making across vertebrates as a whole. To begin using this data follow these steps:

First familiarize yourself with the columns contained within the dataset by studying each individual column name and measurement type associated with it i.e Date(yy mm dd)/Time(military)/Temperature etc..

Visualize basic relationships between columns using simple charts e.g Line charts depicting time against Temperature 3 Activity Level vs ethogram comparisons can be visually displayed using scatterplots allowing correlations between both sets of measurements types to be studied more carefully . These chart illustrations can even get broken down further into 1 day intervals for example or other significant events like when food is given for easier comprehension .
4 Look at relationships between date/time intervals vs environmental factors like air temperature/moisture levels or even different activity variables amongst one another e.g Dating and exploring any cause-effect relationships which may exist between specific environmental parameters that could effect owls behavior such as can increased temperatures effect feeding behavior?
5 Utilizing Hypothesis testing -With hypothesis testing one can come up with potential ideas that might explain why certain variables are showing unusual patterns or behaviors , compile datasets that have couple unique characteristics present inside them , and apply various statistical tests measures i n order to prove which situations occur more often than others thus identifying actual events controlling a situation rather than random occurrence based ones

Following these steps will enable you to understand woodland creature behaviour better while proving endearing functions such as prediction capability related directly back towards ecosystem well being

Research Ideas

Analyzing changes in owl behavior over time by comparing the data points of multiple owls in the same area.

Using the data to study environmental factors that may influence owl behavior, such as climate, vegetation, and any other landscaping elements present near the nests.

Creating automated detection systems for observing owl activity by using machine learning algorithms to interpret complex behavioral patterns from this dataset

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

**Unknown License - Please check the dataset d...
Global Land Cover 1992-2019
sdgs-uneplive.opendata.arcgis.com
hub.arcgis.com
+1more
Updated Jul 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UN Environment, Early Warning &Data Analytics (2023). Global Land Cover 1992-2019 [Dataset]. https://sdgs-uneplive.opendata.arcgis.com/maps/40e95f7958e5496c99c022fb730c6aa6
Explore at:
Dataset updated
Jul 9, 2023
Dataset provided by
United Nations Environment Programmehttp://www.unep.org/
Authors
UN Environment, Early Warning &Data Analytics
Area covered
Description
This layer is a time series of the annual ESA CCI (Climate Change Initiative) land cover maps of the world. ESA has produced land cover maps for the years since 1992. These are available at the European Space Agency Climate Change Initiative website.Time Extent: 1992-2019Cell Size: 300 meterSource Type: ThematicPixel Type: 8 Bit UnsignedData Projection: GCS WGS84Mosaic Projection: Web Mercator Auxiliary SphereExtent: GlobalSource: ESA Climate Change InitiativeUpdate Cycle: AnnualWhat can you do with this layer?This layer may be added to ArcGIS Online maps and applications and shown in a time series to watch a "time lapse" view of land cover change since 1992 for any part of the world. The same behavior exists when the layer is added to ArcGIS Pro.In addition to displaying all layers in a series, this layer may be queried so that only one year is displayed in a map. This layer can be used in analysis. For example, the layer may be added to ArcGIS Pro with a query set to display just one year. Then, an area count of land cover types may be produced for a feature dataset using the zonal statistics tool. Statistics may be compared with the statistics from other years to show a trend.To sum up area by land cover using this service, or any other analysis, be sure to use an equal area projection, such as Albers or Equal Earth.Different Classifications Available to MapFive processing templates are included in this layer. The processing templates may be used to display a smaller set of land cover classes.Cartographic Renderer (Default Template)Displays all ESA CCI land cover classes.*Forested lands TemplateThe forested lands template shows only forested lands (classes 50-90).Urban Lands TemplateThe urban lands template shows only urban areas (class 190).Converted Lands TemplateThe converted lands template shows only urban lands and lands converted to agriculture (classes 10-40 and 190).Simplified RendererDisplays the map in ten simple classes which match the ten simplified classes used in 2050 Land Cover projections from Clark University.Any of these variables can be displayed or analyzed by selecting their processing template. In ArcGIS Online, select the Image Display Options on the layer. Then pull down the list of variables from the Renderer options. Click Apply and Close. In ArcGIS Pro, go into the Layer Properties. Select Processing Templates from the left hand menu. From the Processing Template pull down menu, select the variable to display.Using TimeBy default, the map will display as a time series animation, one year per frame. A time slider will appear when you add this layer to your map. To see the most current data, move the time slider until you see the most current year.In addition to displaying the past quarter century of land cover maps as an animation, this time series can also display just one year of data by use of a definition query. For a step by step example using ArcGIS Pro on how to display just one year of this layer, as well as to compare one year to another, see the blog called Calculating Impervious Surface Change.Hierarchical ClassificationLand cover types are defined using the land cover classification (LCCS) developed by the United Nations, FAO. It is designed to be as compatible as possible with other products, namely GLCC2000, GlobCover 2005 and 2009.This is a heirarchical classification system. For example, class 60 means "closed to open" canopy broadleaved deciduous tree cover. But in some places a more specific type of broadleaved deciduous tree cover may be available. In that case, a more specific code 61 or 62 may be used which specifies "open" (61) or "closed" (62) cover.CitationESA. Land Cover CCI Product User Guide Version 2. Tech. Rep. (2017). Available at: maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdfMore technical documentation on the source datasets is available here:https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=doc
d
National Coastal Erosion Risk Mapping (NCERM) - National (2024)
environment.data.gov.uk
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environment Agency (2024). National Coastal Erosion Risk Mapping (NCERM) - National (2024) [Dataset]. https://environment.data.gov.uk/dataset/9fede91f-5acd-4fd2-9bd8-98153fa3c2ff
Explore at:
Dataset updated
Nov 28, 2024
Dataset authored and provided by
Environment Agency
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The National Coastal Erosion Risk Map shows projected areas at risk from erosion. The erosion risk zones are created by splitting the coastline into ‘frontages’. These frontages are defined as lengths of coast with consistent characteristics based on location, the cliff behaviour characteristics and the defence characteristics.

It is intended as an up-to-date and reliable benchmark dataset showing erosion risk extents for:

Two periods: Medium Term (up to 2055) and Long Term (up to 2105)

Two management scenarios: With Shoreline Management Plans delivered and No Future Intervention

Three climate scenarios: Present Day climate (2020), Higher Central allowance and Upper End allowance. The two allowances use sea level rise data from UKCP18 RCP8.5 70th and 95th percentiles respectively.

Defence type and SMP policies for each of the two periods described above are included. All distances are cumulative over time and given in metres.

Ground instability zones show areas of geologically complex cliffs where land has previously experienced ground movement. This zone uses the rear scarp position as the landward extent. A buffer zone identified as having the potential risk of future movement in the next 100 years is also included.

INFORMATION WARNINGS:

The data and associated information are intended for guidance only - it cannot provide details for individual properties.

The data shows areas of land likely to be at erosion risk but does not show the precise future position of the shoreline.

Erosion may happen faster or slower than what we show, and risk may change over time.

The information is provided as best estimates based upon historic data termed ‘present day’ and, the higher central and upper end sea level rise climate change allowances representing UKCP18 RCP8.5 sea level rise projections. Unlike the previous NCERM, data ranges based on percentiles are not provided.

The NCERM information considers the predominant risk at the coast, although flooding and erosion processes are often linked, and data on erosion of foreshore features are, in general, not included.

Some parts of the coast have complex geology causing ground instability. Unlike the previous NCERM, data on these zones of ground instability are provided. More detailed information on these areas may be available from local authorities.

This dataset succeeds National Coastal Erosion Risk Mapping (NCERM) - National (2018 - 2021) Attribution statement: © Environment Agency copyright and/or database right
d
Reporting behavior from WHO COVID-19 public data
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Auss Abbood (2025). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.9s4mw6mmb
Dataset updated
Jul 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
Auss Abbood
Time period covered
Dec 16, 2022
Description
Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.
Z
Dataset for paper "Mitigating the effect of errors in source parameters on...
data.niaid.nih.gov
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nienke Blom; Phil-Simon Hardalupas; Nicholas Rawlinson (2022). Dataset for paper "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6969601
Explore at:
Dataset updated
Sep 28, 2022
Dataset provided by
University of Cambridge
Authors
Nienke Blom; Phil-Simon Hardalupas; Nicholas Rawlinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset corresponding to the journal article "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" by Blom, Hardalupas and Rawlinson, accepted for publication in Geophysical Journal International. In this paper, we demonstrate the effect or errors in source parameters on seismic tomography, with a particular focus on (full) waveform tomography. We study effect both on forward modelling (i.e. comparing waveforms and measurements resulting from a perturbed vs. unperturbed source) and on seismic inversion (i.e. using a source which contains an (erroneous) perturbation to invert for Earth structure. These data were obtained using Salvus, a state-of-the-art (though proprietary) 3-D solver that can be used for wave propagation simulations (Afanasiev et al., GJI 2018).

This dataset contains:

The entire Salvus project. This project was prepared using Salvus version 0.11.x and 0.12.2 and should be fully compatible with the latter.

A number of Jupyter notebooks used to create all the figures, set up the project and do the data processing.

A number of Python scripts that are used in above notebooks.

two conda environment .yml files: one with the complete environment as used to produce this dataset, and one with the environment as supplied by Mondaic (the Salvus developers), on top of which I installed basemap and cartopy.

An overview of the inversion configurations used for each inversion experiment and the name of hte corresponding figures: inversion_runs_overview.ods / .csv .

Datasets corresponding to the different figures.

One dataset for Figure 1, showing the effect of a source perturbation in a real-world setting, as previously used by Blom et al., Solid Earth 2020

One dataset for Figure 2, showing how different methodologies and assumptions can lead to significantly different source parameters, notably including systematic shifts. This dataset was kindly supplied by Tim Craig (Craig, 2019).

A number of datasets (stored as pickled Pandas dataframes) derived from the Salvus project. We have computed:

travel-time arrival predictions from every source to all stations (df_stations...pkl)

misfits for different metrics for both P-wave centered and S-wave centered windows for all components on all stations, comparing every time waveforms from a reference source against waveforms from a perturbed source (df_misfits_cc.28s.pkl)

addition of synthetic waveforms for different (perturbed) moment tenors. All waveforms are stored in HDF5 (.h5) files of the ASDF (adaptable seismic data format) type

How to use this dataset:

To set up the conda environment:

make sure you have anaconda/miniconda

make sure you have access to Salvus functionality. This is not absolutely necessary, but most of the functionality within this dataset relies on salvus. You can do the analyses and create the figures without, but you'll have to hack around in the scripts to build workarounds.

Set up Salvus / create a conda environment. This is best done following the instructions on the Mondaic website. Check the changelog for breaking changes, in that case download an older salvus version.

Additionally in your conda env, install basemap and cartopy:

conda-env create -n salvus_0_12 -f environment.yml conda install -c conda-forge basemap conda install -c conda-forge cartopy

Install LASIF (https://github.com/dirkphilip/LASIF_2.0) and test. The project uses some lasif functionality.

To recreate the figures: This is extremely straightforward. Every figure has a corresponding Jupyter Notebook. Suffices to run the notebook in its entirety.

Figure 1: separate notebook, Fig1_event_98.py

Figure 2: separate notebook, Fig2_TimCraig_Andes_analysis.py

Figures 3-7: Figures_perturbation_study.py

Figures 8-10: Figures_toy_inversions.py

To recreate the dataframes in DATA: This can be done using the example notebook Create_perturbed_thrust_data_by_MT_addition.py and Misfits_moment_tensor_components.M66_M12.py . The same can easily be extended to the position shift and other perturbations you might want to investigate.

To recreate the complete Salvus project: This can be done using:

the notebook Prepare_project_Phil_28s_absb_M66.py (setting up project and running simulations)

the notebooks Moment_tensor_perturbations.py and Moment_tensor_perturbation_for_NS_thrust.py

For the inversions: using the notebook Inversion_SS_dip.M66.28s.py as an example. See the overview table inversion_runs_overview.ods (or .csv) as to naming conventions.

References:

Michael Afanasiev, Christian Boehm, Martin van Driel, Lion Krischer, Max Rietmann, Dave A May, Matthew G Knepley, Andreas Fichtner, Modular and flexible spectral-element waveform modelling in two and three dimensions, Geophysical Journal International, Volume 216, Issue 3, March 2019, Pages 1675–1692, https://doi.org/10.1093/gji/ggy469

Nienke Blom, Alexey Gokhberg, and Andreas Fichtner, Seismic waveform tomography of the central and eastern Mediterranean upper mantle, Solid Earth, Volume 11, Issue 2, 2020, Pages 669–690, 2020, https://doi.org/10.5194/se-11-669-2020

Tim J. Craig, Accurate depth determination for moderate-magnitude earthquakes using global teleseismic data. Journal of Geophysical Research: Solid Earth, 124, 2019, Pages 1759– 1780. https://doi.org/10.1029/2018JB016902
m
Indoor Environment WIFI-RSSI Data Set
data.mendeley.com
Updated Aug 30, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TARIK AYABAKAN (2020). Indoor Environment WIFI-RSSI Data Set [Dataset]. http://doi.org/10.17632/dh7j28t5kg.1
Explore at:
Unique identifier
https://doi.org/10.17632/dh7j28t5kg.1
Dataset updated
Aug 30, 2020
Authors
TARIK AYABAKAN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Measurement data for Indoor Positioning study is collected at a 4m×5m test bed, which was set up at Kadir Has University, Istanbul, Turkey. Four wireless access points are used during data collection which are located at corners of the test bed. Markers are placed at every 50 cm and measurements are done on the grid shown at the TEST BED.jpg file. RSSI measurements are stored via NetSurveyor program running on a Lenovo Ideapad FLEX 4laptop, which has an Intel Dual Band Wireless-AC 8260 Wi-Fi adaptor. Raw measurement data are stored as xml format and process at MATLAB for simulations. Data can be used especially for indoor position estimation studies.
Dataset of a Study of Computational reproducibility of Jupyter notebooks...
zenodo.org
pdf, zip
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sheeba Samuel; Sheeba Samuel; Daniel Mietchen; Daniel Mietchen (2024). Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications [Dataset]. http://doi.org/10.5281/zenodo.8226725
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8226725
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sheeba Samuel; Sheeba Samuel; Daniel Mietchen; Daniel Mietchen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

Data Collection and Analysis

We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.

Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.

All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.

Our reproducibility pipeline was started on 27 March 2023.

Repository Structure

Our repository is organized into two main folders:

archaeology: This directory hosts scripts designed to download, parse, and extract metadata from PubMed Central publications and associated repositories. There are 24 database tables created which store the information on articles, journals, authors, repositories, notebooks, cells, modules, executions, etc. in the db.sqlite database file.

analyses: Here, you will find notebooks instrumental in the in-depth analysis of data related to our study. The db.sqlite file generated by running the archaelogy folder is stored in the analyses folder for further analysis. The path can however be configured in the config.py file. There are two sets of notebooks: one set (naming pattern N[0-9]*.ipynb) is focused on examining data pertaining to repositories and notebooks, while the other set (PMC[0-9]*.ipynb) is for analyzing data associated with publications in PubMed Central, i.e.\ for plots involving data about articles, journals, publication dates or research fields. The resultant figures from the these notebooks are stored in the 'outputs' folder.

MethodsWorkflow: The MethodsWorkflow file provides a conceptual overview of the workflow used in this study.

Accessing Data and Resources:

All the data generated during the initial study can be accessed at https://doi.org/10.5281/zenodo.6802158

For the latest results and re-run data, refer to this link.

The comprehensive SQLite database that encapsulates all the study's extracted data is stored in the db.sqlite file.

The metadata in xml format extracted from PubMed Central which contains the information about the articles and journal can be accessed in pmc.xml file.

System Requirements:

Centos 7 (Documentation: https://www.centos.org/)

Conda 4.9.4 (Installation Guide: https://docs.anaconda.com/anaconda/install/linux/)

Python 3.7.6 (Download Link: https://www.python.org/downloads/)

GitHub account (Get Started: https://github.com/, Requires GitHub Username and Token)

gcc 7.3.0 (Installation Guide: https://gcc.gnu.org/install/)

lbzip2 (Command: `conda install -c conda-forge lbzip2')

Running the pipeline:

Clone the computational-reproducibility-pmc repository using Git:
git clone https://github.com/fusion-jena/computational-reproducibility-pmc.git

Navigate to the computational-reproducibility-pmc directory:
cd computational-reproducibility-pmc/computational-reproducibility-pmc

Configure environment variables in the config.py file:
GITHUB_USERNAME = os.environ.get("JUP_GITHUB_USERNAME", "add your github username here")
GITHUB_TOKEN = os.environ.get("JUP_GITHUB_PASSWORD", "add your github token here")

Other environment variables can also be set in the config.py file.
BASE_DIR = Path(os.environ.get("JUP_BASE_DIR", "./")).expanduser() # Add the path of directory where the GitHub repositories will be saved
DB_CONNECTION = os.environ.get("JUP_DB_CONNECTION", "sqlite:///db.sqlite") # Add the path where the database is stored.

To set up conda environments for each python versions, upgrade pip, install pipenv, and install the archaeology package in each environment, execute:
source conda-setup.sh

Change to the archaeology directory
cd archaeology

Activate conda environment. We used py36 to run the pipeline.
conda activate py36

Execute the main pipeline script (r0_main.py):
python r0_main.py

Running the analysis:

Navigate to the analysis directory.
cd analyses

Activate conda environment. We use raw38 for the analysis of the metadata collected in the study.
conda activate raw38

Install the required packages using the requirements.txt file.
pip install -r requirements.txt

Launch Jupyterlab
jupyter lab

Refer to the Index.ipynb notebook for the execution order and guidance.

References:

Sheeba Samuel, Daniel Mietchen. (2024). Computational reproducibility of Jupyter notebooks from biomedical publications, https://doi.org/10.1093/gigascience/giad113, GigaScience

Sheeba Samuel, Daniel Mietchen. (2022). Computational reproducibility of Jupyter notebooks from biomedical publications, https://arxiv.org/pdf/2209.04308.pdf, CoRR abs/2209.04308

Sheeba Samuel, & Daniel Mietchen. (2022). Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6802158
F
OpenAXES Example Robot Dataset
data.uni-hannover.de
zip,csv
Updated Jul 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut fuer Mikroelektronische Systeme (2023). OpenAXES Example Robot Dataset [Dataset]. https://data.uni-hannover.de/dataset/openaxes-example-robot-dataset
Explore at:
zip,csvAvailable download formats
Dataset updated
Jul 24, 2023
Dataset authored and provided by
Institut fuer Mikroelektronische Systeme
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This is an example dataset recorded using version 1.0 of the open-source-hardware OpenAXES IMU. Please see the github repository for more information on the hardware and firmware. Please find the most up-to-date version of this document in the repository

This dataset was recorded using four OpenAXES IMUs mounted on the segments of a robot arm (UR5 by Universal Robots). The robot arm was programmed to perform a calibration movement, then trace a 2D circle or triangle in the air with its tool center point (TCP), and return to its starting position, at four different speeds from 100 mm/s to 250 mm/s. This results in a total of 8 different scenarios (2 shapes times 4 speeds). The ground truth joint angle and TCP position values were obtained from the robot controller. The calibration movement at the beginning of the measurement allows for calculating the exact orientation of the sensors on the robot arm.

The IMUs were configured to send the raw data from the three gyroscope axes and the six accelerometer axes to a PC via BLE with 16 bit resolution per axis and 100 Hz sample rate. Since no data packets were lost during this process, this dataset allows comparing and tuning different sensor fusion algorithms on the recorded raw data while using the ground truth robot data as a reference.

In order to visualize the results, the quaternion sequences from the IMUs were applied to the individual segments of a 3D model of the robot arm. The end of this kinematic chain represents the TCP of the virtual model, which should ideally move along the same trajectory as the ground truth, barring the accuracy of the IMUs. Since the raw sensor data of these measurements is available, the calibration coefficients can also be applied ex-post.

Since there are are 6 joints but only 4 IMUS, some redundancy must be exploited. The redundancy comes from the fact that each IMU has 3 rotational degrees of fredom, but each joint has only one:

The data for q0 and q1 are both derived from the orientation of the "humerus" IMU.

q2 is the difference^† between the orientation of the "humerus" and "radius" IMUs.

q3 is the difference between the orientation of the "radius" and "carpus" IMUs.

q4 is the difference between the orientation of the "carpus" and "digitus" IMUs.

The joint q5 does not influence the position of the TCP, only its orientation, so it is ignored in the evaluation.

^† Of course, difference here means not the subtraction of the quaternions but the rotational difference, which is the R1 * inv(R0) for two quaternions (or rotations) R0 and R1. The actual code works a bit differently, but this describes the general principle.

Data

Data recorded from the IMUs is in the directory measure_raw-2022-09-15/, one folder per scenario. In those folders, there is one CSV file per IMU.

Data recorded from the robot arm is in the directory measure_raw-2022-09-15/robot/, one CSV and MAT file per scenario.

Some photos and videos of the recording process can be found in Media. Videos are stored in git lfs.

Evaluation

The file openaxes-example-robot-dataset.ipynb is provided to play around with the data in the dataset and demonstrate how the files are read and interpreted. To use the notebook, set up a Python 3 virtual environment and therein install the necessary packets with pip install -r resuirements.txt. In order to view the graphs contained in the ipynb file, you will most likely have to trust the notebook beforehand, using the following command:

jupyter trust openaxes-example-robot-dataset.ipynb

Beware: This notebook is not a comprehensive evaluation and any results and plots shown in the file are not necessarily scientifically sound evidence of anything.

The notebook will store intermediate files in the measure_raw-2022-09-15 directory, like the quaternion files calculated by the different filters, or the files containing the reconstructed TCP positions. All intermediate files should be ignored by the file measure_raw-2022-09-15/.gitignore.

The generated intermediate files are also provided in the file measure_raw-2022-09-15.tar.bz2, in case you want to inspect the generated files without running the the notebook.

Tools

A number of tools are used in the evaluation notebook. Below is a short overview, but not a complete specification. If you need to understand the input and output formats for each tool, please read the code.

The file calculate-quaternions.py is used in the evaluation notebook to compute different attitude estimation filters like Madgwick or VQF on the raw accelerometer and gyroscrope measurements at 100 Hz.

The directory madgwick-filter contains a small C program that applies the original Madgwick filter to a CSV file containing raw measurements and prints the results. It is used by calculate-quaternions.py.

The file calculate-robot-quaternions.py calculates a CSV file of quaternions equivalent to the IMU quaternions from a CSV file containing the joint angles of the robot.

The program dsense_vis mentioned in the notebook is used to calculate the 3D model of the robot arm from quaternions and determine the mounting orientations of the IMUs on the robot arm. This program will be released at a future date. In the meantime, the output files of dsense_vis are provided in the file measure_raw-2022-09-15.tar.bz2, which contains the complete content of the measure_raw-2022-09-15 directory after executing the whole notebook. Just unpack this archive and merge its contents with the measure_raw-2022-09-15 directory. This allows you to explore the reconstructed TCP files for the filters implemented at the time of publication.
E
Everyone's Gliding Observatories (EGO) Global Data Assembly Centre (GDAC)...
edmed.seadatanet.org
bodc.ac.uk
nc
Updated Sep 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marine Institute (2021). Everyone's Gliding Observatories (EGO) Global Data Assembly Centre (GDAC) dataset (2009 onwards) [Dataset]. https://edmed.seadatanet.org/report/7140/
Explore at:
ncAvailable download formats
Dataset updated
Sep 14, 2021
Dataset provided by
University of Gothenburg
Balearic Islands Coastal Observing and Forecasting System
Israel Oceanographic and Limnological Research
Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research
Woods Hole Oceanographic Institution
Université Laval
El Instituto del Mar del Perú
Helmholtz-Zentrum Geesthacht, Centre for Materials and Coastal Research
Cyprus Oceanography Center
Bjerknes Centre for Climate Research
Ifremer Head Office
Oceanic Platform of the Canary Islands
Memorial University of Newfoundland, Ocean Sciences Centre
Scripps Institution of Oceanography
National Institute of Oceanography and Applied Geophysics - OGS, Division of Oceanography
Fisheries and Oceans Canada, Marine Environmental Data Section
Scottish Association for Marine Science
Department of Marine Systems at Tallinn University of Technology
National Oceanography Centre (Southampton)
Centre for Maritime Research and Experimentation
Ocean Gliders Facility, Integrated Marine Observing System
Mediterranean Institute of Oceanography (Endoume)
CNRS, Laboratory of studies on Spatial Geophysics and Oceanography
Marine Institute
Rutgers, The State University of New Jersey, Institute of Marine and Coastal Sciences
Mediterranean Institute for Advanced Studies
CNR, Institute of Marine Sciences S.S. of Lerici (SP)
UK Polar Data Centre
Helmholtz Centre for Ocean Research Kiel
Council for Scientific and Industrial Research Natural Resources and Environment, Pretoria
Laboratory for Ocean Physics and Satellite remote
Bermuda Institute of Ocean Sciences
University of East Anglia, School of Environmental Sciences
Center for Marine Environmental Sciences, University of Bremen
Hellenic Centre for Marine Research, Institute of Oceanography, Physical Oceanographic Department
Finnish Meteorological Institute
Marine Robotics Innovation centre
Laboratory of Oceanography and Climate, Experiments and numerical Approaches
Laboratory of Oceanography of Villefranche
ENSTA Paristech
CNRS, Insu Technical Division, La Seyne-sur-Mer
National Oceanography Centre (Liverpool)
Laboratory of Phyical Oceanography, CNRS-IFREMER-IRD-UBO
Center for Korea Autonomous Ocean Observing System, Kyungpook National University
Oregon State University, College of Earth, Ocean, and Atmospheric Sciences
Teledyne RD Instruments
University of Perpignan, CEFREM
License
https://vocab.nerc.ac.uk/collection/L08/current/UN/https://vocab.nerc.ac.uk/collection/L08/current/UN/
Time period covered
Jan 1, 2009 - Present
Area covered

Description
The Everyone's Gliding Observatories (EGO) initiative is a gathering of several teams of oceanographers, interested in developing the use of gliders for ocean observations. EGO started in Europe with members from France, Germany, Italy, Norway, Spain, and the United Kingdom. The partners of EGO have been funded by both European and national agencies to operate gliders for various purposes and at different sites. Coordinated actions are being set up for these sites in order to demonstrate the capabilities of a fleet of gliders for sampling the ocean, with a given scientific and/or operational objective.
Kansas Environmental Public Health Tracking (EPHT) Data Explorer
hub.kansasgis.org
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KDHE Public ArcGIS (2023). Kansas Environmental Public Health Tracking (EPHT) Data Explorer [Dataset]. https://hub.kansasgis.org/datasets/kdhe::kansas-environmental-public-health-tracking-epht-data-explorer
Explore at:
Dataset updated
Jun 9, 2023
Dataset provided by
Kansas Department of Health and Environmenthttp://www.kdheks.gov/
Authors
KDHE Public ArcGIS
Area covered
Kansas
Description
Welcome to the Kansas Environmental Public Health Tracking (EPHT) Data Explorer, a web app provided by the Kansas Department of Health and Environment (EPHT). This web app visualizes data about hazards in the environment that may be related to health issues for Kansans.Environmental Public Health TrackingEnvironmental causes of chronic diseases are hard to identify. Measuring amounts of hazardous substances in our environment in a standard way, tracing the spread of these over time and area, seeing how they show up in human tissues, and understanding how they may cause illness is critical. The National Environmental Public Health Tracking Network is a tool that can help connect these efforts. For more information on Kansas Public Health Tracking: EPHT home pageWhile there are many ways to define environmental health, for the purposes of this website, it means how the environment might affect a person's health and how people might affect the health of the environment. The environment is our air, our water, our food, and our surroundings. Tracking describes how we collect data, interpret it, and report it. We are acquiring data about hazards in the environment, if a person was exposed to one of them, and health problems that may be related to these exposures. Different types of data are used to learn how the environment affects people's health. The Tracking Network provides information about the following types of data: • Health effect data: Data about the health conditions and diseases, such as asthma and birth defects. • Environmental hazard data: Data about the chemicals or other substances such as carbon monoxide and air pollution in the environment. • Exposure data: Data about the amount of a chemical in a person's body, such as lead in blood. • Other data: Data that helps us learn about relationships between exposures and health effects. For example, information about age, sex, race, and behavior or lifestyle choices that may help us understand why a person has a particular health problem.From Data to ActionThe KDHE Tracking program has many functions beyond the collection, analysis, and utilization of data. We also are involved in different types of surveillance, conduct non-infectious disease cluster investigations, interpret data for use by internal and external partners as well as the public. Information gathered related to environmental health is used to inform others, enable public health actions, and guide environmental health interventions. The goal of tracking is to provide valuable information that can be used to plan, apply, and evaluate actions to prevent and control environmentally related diseases or hazardous exposures. In other words, we take data and help turn it into action.
a
Louisville Metro KY - Environmental Health Bulk Data - Complaints
louisville-metro-opendata-lojic.hub.arcgis.com
s.cnmilf.com
+5more
Updated Jan 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Environmental Health Bulk Data - Complaints [Dataset]. https://louisville-metro-opendata-lojic.hub.arcgis.com/datasets/LOJIC::louisville-metro-ky-environmental-health-bulk-data-complaints
Explore at:
Dataset updated
Jan 7, 2023
Dataset authored and provided by
Louisville/Jefferson County Information Consortium
Area covered
Louisville, Kentucky
Description
Attributes of Public Health related complaints reported by the public and investigated by Louisville Metro Department of Public Health and Wellness. Personal/identifying data has been removed. EstID column can be joined to the EstablishmentID column in the Establishments table to show attributes of the establishment when a regulated establishment is involved. Data Dictionary: RequestComplaintID-System ID Rcode-Code for Type of Complaint or Request RCodeDescription-Text of type of complaint or request EHS-the license number of the technician investigating section-complaint section TakenBy-who took the complaint or service request Method-how the complaint or service request was taken in Duplicate-is this a duplicate request? System notation EstID-Associated permitted facility if applicable oss_id-associated onsite sewage file if applicable RequestDate-date of request or complaint ResolvedDate-date request or complaint was resolved NextInspDate-scheduled next inspection date IsCityorCounty-is this a city or county request? Status-is the request open or resolved? RequestType-request type IsFollowUp-is this a folowo up entry to an already open request OwnerType- Owner or agent? If applicable PersonOrPremiseZip-zip code of subject person ComplaintantZip-zip code of person making the complaint OwnerZip-zip code of the owner of the premises if applicable

Health and Wellness Protects and promotes the health, environment and well being of the people of Louisville, providing health-related programs and health office locations community wide.

Contact:

Gerald Kaforski

gerald.kaforski@louisvilleky.gov
Potential Sites of Hydropower Opportunity
ckan.publishing.service.gov.uk
data.europa.eu
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2016). Potential Sites of Hydropower Opportunity [Dataset]. https://ckan.publishing.service.gov.uk/dataset/potential-sites-of-hydropower-opportunity1
Explore at:
Dataset updated
Jan 5, 2016
Dataset provided by
CKANhttps://ckan.org/
Description
PLEASE NOTE: This record has been Retired. Potential Sites of Hydropower Opportunity will no longer be updated. Many factors have changed since this data was created in 2010 such as waterbody status and weir status. There has been no updates to the data since 2010 when it was created. The data was created when hydropower schemes were few, with the intention of highlighting potential sites for new schemes to stakeholders. A hydropower scheme is now located in many of these sites. This data is now inaccurate. To find out more about a potential site for hydropower, please contact the local Environment Agency office. INFORMATION WARNING These datasets are not up to date and not being updated. There is not a high level of confidence in its current accuracy. These data are intended to provide a general national and regional overviews of the potential hydropower opportunities available, their locations, and their relative environmental sensitivity to exploitation. At site-level, there will be some error inherent in the results as the map uses a national GIS dataset that is based on various sources. One-third of the sites where older ‘Synthetic Aperture RADAR (SAR) data was used for the height estimate include an error of up to one metre. The remaining two-thirds use ‘Light Detection and Ranging’ (LIDAR), which is accurate to 25cm. This means that the data for an individual site may be inaccurate, but at the national and regional level the error will be averaged out to an extent. There is not a high level of confidence in the power generation calculation. The power category takes account of this uncertainty. These data are indicative only and are not intended to replace any part of an individual site assessment, which is necessary for a full scheme appraisal. These data show the location of hydropower opportunities that appear to have a lower risk of environmental sensitivity and a higher potential for power generation associated with exploiting them in England and Wales. A filter has been applied to a total of 25,935 ‘barriers’ identifying a total of 4195 where environmental sensitivity appears to be low and potential power generation high. The term ‘barriers’ is used to describe sites with sufficient drop to provide a hydropower opportunity. They are mostly weirs, but could also be other man-made structures or natural features, such as waterfalls. The filters applied are both: - Within one of 2708 heavily modified water bodies. These are water bodies which have been identified as being at significant risk of failing to achieve good ecological status under the Water Framework Directive, because of modifications to their hydromorphological characteristics, resulting from past engineering works, including impounding works. - Medium to high power potential, which includes opportunities of greater than 10kW. Be aware that this filtering is based only on these statistics and does not indicate that a hydropower opportunity is necessarily feasible at any given location. Given the scale of the project and the data used, the results are not intended to replace any part of an individual site assessment. Instead, the dataset gives national and regional level overviews of the potential opportunities available, their locations, and their relative environmental sensitivity to exploitation. The unfiltered dataset, 'Potential Sites of Hydropower Opportunity', is covered by AfA175. Attribution statement: © Environment Agency copyright and/or database right 2015. All rights reserved.

Facebook

Twitter

Click to copy link

Link copied

Cite

Md Abu Bakar Siddik; Arman Shehabi; Landon Marston (2023). The environmental footprint of data centers in the United States [Dataset]. http://doi.org/10.7294/14504913.v2

Data from: The environmental footprint of data centers in the United States

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.7294/14504913.v2

Dataset updated

May 30, 2023

Dataset provided by

University Libraries, Virginia Tech

Authors

Md Abu Bakar Siddik; Arman Shehabi; Landon Marston

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

United States

Description

Much of the world’s data are stored, managed, and distributed by data centers. Data centers re-quire a tremendous amount of energy to operate, accounting for around 1.8% of electricity use in the United States. Large amounts of water are also required to operate data centers, both directly for liquid cooling and indirectly to produce electricity. For the first time, we calculate spatially-detailed carbon and water footprints of data centers operating within the United States, which is home to around one-quarter of all data center servers globally. Our bottom-up approach reveals one-fifth of data center servers direct water footprint comes from moderately to highly water stressed watersheds, while nearly half of servers are fully or partially powered by power plants located within water stressed regions. Approximately 0.5% of total US greenhouse gas emissions are attributed to data centers. We investigate tradeoffs and synergies between data center’s water and energy utilization by strategically locating data centers in areas of the country that will minimize one or more environmental footprints. Our study quantifies the environmental implications behind our data creation and storage and shows a path to decrease the environmental footprint of our increasing digital footprint..

Clear search

Close search

Google apps

Main menu

Data from: The environmental footprint of data centers in the United States

Data from: A dataset to model Levantine landcover and land-use change...

Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

Data from: Genomic prediction of marker × environment interaction Kernel...

Open University Learning Analytics Dataset

Open University Learning Analytics Dataset

Student Performance and Engagement Data at The Open University

About this dataset

How to use the dataset

How To Use This Dataset:

In situ chemical oxidisation (ICO) of petroleum hydrocarbons at old Casey...

Hazardous Waste Manifest Data (CT) 1984 – 2008

Mental Health Support Feature Analysis

Mental Health Support Feature Analysis

Correlating Text Features and Mental Health Indicators

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Owl Behavioral Analysis

Owl Behavioral Analysis

Assessing Nocturnal Activity in Owls

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to use this dataset

Research Ideas

Acknowledgements

License

Global Land Cover 1992-2019

National Coastal Erosion Risk Mapping (NCERM) - National (2024)

Reporting behavior from WHO COVID-19 public data

Dataset for paper "Mitigating the effect of errors in source parameters on...

Indoor Environment WIFI-RSSI Data Set

Dataset of a Study of Computational reproducibility of Jupyter notebooks...

OpenAXES Example Robot Dataset

Data

Evaluation

Tools

Everyone's Gliding Observatories (EGO) Global Data Assembly Centre (GDAC)...

Kansas Environmental Public Health Tracking (EPHT) Data Explorer

Louisville Metro KY - Environmental Health Bulk Data - Complaints

Potential Sites of Hydropower Opportunity

Data from: The environmental footprint of data centers in the United States