Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
OSU_SnowCourse Summary: Manual snow course observations were collected over WY 2012-2014 from four paired forest-open sites chosen to span a broad elevation range. Study sites were located in the upper McKenzie (McK) River watershed, approximately 100 km east of Corvallis, Oregon, on the western slope of the Cascade Range and in the Middle Fork Willamette (MFW) watershed, located to the south of the McKenzie. The sites were designated based on elevation, with a range of 1110-1480 m. Distributed snow depth and snow water equivalent (SWE) observations were collected via monthly manual snow courses from 1 November through 1 April and bi-weekly thereafter. Snow courses spanned 500 m of forested terrain and 500 m of adjacent open terrain. Snow depth observations were collected approximately every 10 m and SWE was measured every 100 m along the snow courses with a federal snow sampler. These data are raw observations and have not been quality controlled in any way. Distance along the transect was estimated in the field. OSU_SnowDepth Summary: 10-minute snow depth observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meterological stations were located in the approximate center of each forest or open snow course transect. These data have undergone basic quality control. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN flags for missing data to NA, and added site attributes such as site name and cover. We replaced positive values with NA, since snow depth values in raw data are negative (i.e., flipped, with some correction to use the height of the sensor as zero). Thus, positive snow depth values in the raw data equal negative snow depth values. Second, the sign of the data was switched to make them positive. Then, the smooth.m (MATLAB) function was used to roughly smooth the data, with a moving window of 50 points. Third, outliers were removed. All values higher than the smoothed values +10, were replaced with NA. In some cases, further single point outliers were removed. OSU_Met Summary: Raw, 10-minute meteorological observations collected at OSU met stations in the upper McKenzie River Watershed and the Middle Fork Willamette Watershed during Water Years 2012-2014. Each meterological tower was deployed to represent either a forested or an open area at a particular site, and generally the locations were paired, with a meterological station deployed in the forest and in the open area at a single site. These data were collected in conjunction with manual snow course observations, and the meteorological stations were located in the approximate center of each forest or open snow course transect. These stations were deployed to collect numerous meteorological variables, of which snow depth and wind speed are included here. These data are raw datalogger output and have not been quality controlled in any way. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files (named "RawData.txt" within each site and year directory) provided by OSU, along with metadata of site attributes. We converted the Excel-based timestamp (seconds since origin) to a date, changed the NaN and 7999 flags for missing data to NA, and added site attributes such as site name and cover. OSU_Location Summary: Location Metadata for manual snow course observations and meteorological sensors. These data are compiled from GPS data for which the horizontal accuracy is unknown, and from processed hemispherical photographs. They have not been quality controlled in any way.
A summary of the raw data. Visit https://dataone.org/datasets/sha256%3Ad2b14d6a9da46e707296080c0c4a17242ca7b713e14be24a256c85693535a891 for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a structured workflow for Lattice Light-Sheet Microscopy image processing, including raw data acquisition (.czi), summarised data (extract the .zarr compressed file), metadata extraction, and image enhancement techniques such as deskewing and deconvolution that can be found as a script (main.py). The dataset is intended for researchers working with high-resolution microscopy data.
Raw Data: Original microscopy images in CZI format
Metadata: Embedded data extracted from Zeiss software can be found directly after processing .czi file, while external metadata is synthetically generated (https://github.com/onionsp/Synthetic-WGS-Dataset-Generator/).
Processing Scripts: Python scripts (as found in main.py
) for deskewing, deconvolution, and data summarization.
Summarized Data: Processed image outputs in .zarr/.tiff format, reducing storage overhead while maintaining key insights.
Data Transfer Agreement: Documentation regarding data sharing policies and agreements.
Deskewing: Corrects image distortions caused during acquisition.
Deconvolution: Enhances image clarity and sharpness.
Downsampling: Reduces resolution for efficient processing and sharing.
Conversion: CZI to Zarr or TIFF format for optimized storage and computational use.
The dataset, including raw and processed files, is hosted on Zenodo.
Users are encouraged to download downsampled versions for testing before using full-resolution data.
Processing scripts enable reproducibility and customization for different research applications.
Data transfer policies are outlined in the included Data Transfer Agreement.
https://github.com/DBK333/Omero-DataPortal/tree/main/OmeroImageSamples
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SDS-PAGE data of amphiphilic proteins for protein capsule construction
UIEF_wind Summary: Within the Flat Creek Unit of the University of Idaho Experimental Forest (UIEF) near Moscow, ID, 30-minute snow depth and meteorological data were collected at seven locations across the Lawler Landing site (elevation 880 m) from February to May of WY 2008. A 70 m north-south oriented transect of 5 snow depth sensors was deployed to record sub-daily snow depth, with co-located meteorological instruments. The sensors traversed a 40 m long elliptical forest gap and the adjacent forest in both directions. The locations were the same as those used previously to quantify how shortwave and longwave radiation vary across a forest gap [Lawler and Link, 2011]. Two additional snow depth sensors and meteorological stations were deployed at “interior forest reference” and “open reference” sites, situated 80 m southeast and 1200 m west, respectively, from the main transect. Whereas the forest reference site was similar to the surrounding forest, the open reference site was much more exposed than the forest gap. These data are generally raw datalogger output and have not been quality controlled in any way unless specifically designated in the variable name. See manufacturer specifications for individual instruments to determine sensor accuracy. This file was compiled from individual raw data files provided by IU, along with approximate coordinates of the sensor locations. Collaborators at the University of Washington (Jessica Lundquist) converted the timestamp given in fractional julian days to a dates and added site attributes such as Location ID and cover. UIEF_snowdepth Summary: Observed snow depth from acoustic sensor. Measurements taken within the Lowler Landing Gap, as part of the University of Idaho Experimental Forest. Sensor data was collected half-hourly during February through May 2008. Sensor data collected at 7 different points. See location metadata and data citation for description of locations. These data include raw values and values that were smoothed by Diana Carson, see data citation for details. UIEF_Location Summary: Within the Flat Creek Unit of the University of Idaho Experimental Forest (UIEF) near Moscow, ID, 30-minute snow depth and meteorological data were collected at seven locations across the Lawler Landing site (elevation 880 m) from February to May of WY 2008. These location metadata are assocatied with each unique location identification, which ties to time series data. See Figure 1 of data citation for schematic map of locations. These coordinates are estimated from Google Earth based on Dr. Timothy Link's memory of where the sensors were located. Other attributes of each location were recorded as field notes as part of the study design.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Content: * Nuclear protein fractions: NSP, NMP and NTP R1-R3 (Peptide counts) * Summary nuclear proteins geLC * Cellular protein fractions: CSP, CMP R1-R3 (Peptide counts) * Summary cellular proteins geLC * Summary nuclear protein fractions NSP R1-R3 (iBAQs) * Summary cellular protein fractions CSP R1-R3 (iBAQs) * 15 columns annotation * Final summary of protein data * Prediction results * Annotation (biological processes and categories)
Tab color: raw data, green; summarized data, orange; final data, red; prediction results, blue
Data Description Managed turfgrass is a common component of urban landscapes that is expanding under current land use trends. Previous studies have reported high rates of soil carbon sequestration in turfgrass, but no systematic review has summarized these rates nor evaluated how they change as turfgrass ages. We conducted a meta-analysis of soil carbon sequestration rates from 63 studies. Those data, as well as the code used to analyze them and create figures, are shared here. Dataset Development We conducted a systematic review from Nov 2020 to Jan 2021 using Google Scholar, Web of Science, and the Michigan Turfgrass Information File Database. The search terms targeted were "soil carbon", "carbon sequestration", "carbon storage", or “carbon stock”, with "turf", "turfgrass", "lawn", "urban ecosystem", or "residential", “Fescue”, “Zoysia”, “Poa”, “Cynodon”, “Bouteloua”, “Lolium”, or “Agrostis”. We included only peer-reviewed studies written in English that measured SOC change over one year or longer, and where grass was managed as turf (mowed or clipped regularly). We included studies that sampled to any soil depth, and included several methodologies: small-plot research conducted over a few years (22 datasets from 4 articles), chronosequences of golf courses or residential lawns (39 datasets from 16 articles), and one study that was a variation on a chronosequence method and compiled long-term soil test data provided by golf courses of various ages (3 datasets from Qian & Follett, 2002). In total, 63 datasets from 21 articles met the search criteria. We excluded 1) duplicate reports of the same data, 2) small plot studies that did not report baseline SOC stocks, and 3) pure modeling studies. We included five papers that only measured changes in SOC concentrations, but not areal stocks (i.e., SOC in Mg ha-1). For these papers, we converted from concentrations to stocks using several approaches. For two papers (Law & Patton, 2017; Y. Qian & Follett, 2002) we used estimated bulk densities provided by the authors. For the chronosequences reported in Selhorst & Lal (2011), we used the average bulk density reported by the author. For the 13 choronosequences reported in Selhorst & Lal (2013), we estimated bulk density from the average relationship between percent C and bulk density reported by Selhorst (2011). For Wang et al. (2014), we used bulk density values from official soil survey descriptions. Data provenance In most cases we contacted authors of the studies to obtain the original data. If authors did not reply after two inquiries, or no longer had access to the data, we captured data from published figures using WebPlotDigitizer (Rohatgi, 2021). For three manuscripts the data was already available, or partially available, in public data repositories. Data provenance information is provided in the document "Dataset summaries and citations.docx". Recommended Uses We recommend the following to data users: Consult and cite the original manuscripts for each dataset, which often provide additional information about turfgrass management, experimental methods, and environmental context. Original citations are provided in the document "Dataset summaries and citations.docx". For datasets that were previously published in public repositories, consult and cite the original datasets, which may provide additional data on turfgrass management practices, soil nitrogen, and natural reference sites. Links to repositories are in the document "Dataset summaries and citations.docx". Consider contacting the dataset authors to notify them of your plans to use the data, and to offer co-authorship as appropriate.
This dataset is the source dataset and contains raw data values. It will replace the current data download (https://safetydata.fra.dot.gov/OfficeofSafety/publicsite/on_the_fly_download.aspx) when the safetydata.fra.dot.gov site is decommissioned in 2024. To download data that contains data is a user-friendly human-readable format, please reference https://data.transportation.gov/Railroads/Injury-Illness-Summary-Operational-Data/m8i6-zdsy.
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)
United States Census Bureau
SELECT
zipcode,
population
FROM
bigquery-public-data.census_bureau_usa.population_by_zip_2010
WHERE
gender = ''
ORDER BY
population DESC
LIMIT
10
This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data
Supplementary and additional Datasets (raw and preprocessed) in relation to future work of the paper Towards Detecting Inauthentic Coordination in Twitter Likes Data by Laura Jahn and Rasmus K. Rendsvig The supplementary data contains liking and retweeting user data and tweet IDs, supplemented with e.g. Botometer's botscores and later lookups regarding existence. A README facilities ## Repository Structure: - [1] Data from Danish Twitter on National Election
- [2] Data from German Twitter
- [3] Supplementary data to paper_Towards Detecting Inauthentic Coordination in Twitter Likes Data
## Folder content - [1] - Raw Data
: Raw data of liking and retweeting users (you might come across #fv22 in file naming: the hashtag #fv22 is an election hashtag about the Danish National Election) - Preprocessed Data
: - Binary like-user and retweet-user matrices - Botscores
: Botometer v4 and lite scores for all likers and retweeters, also conveniently summarized in feature-frame tables - Clusters
: Bins of perfectly correlated users - Later User and Tweets Lookups
: Later (January, February 2023) lookup of previously collected users and tweets they likes/retweeted - Likers Retweeters Pagination
: Later (January, February 2023) lookup of likers and retweeters using new pagination parameter - [2] - Raw Data
: Raw data of liking and retweeting users (you might come across #bundestag in file naming: the hashtag #bundestag is a German political hashtag) - Preprocessed Data
: Binary like-user and retweet-user matrices - [3] - Additional dataset dkpol July
- Raw Data
: Raw data of liking and retweeting users - Preprocessed Data
: - Binary like-user and retweet-user matrices - Supp data to data used in paper_Towards Detecting Inauthentic Coordination in Twitter Likes Data
- Botscores
: Botometer v4 and lite scores for all likers and retweeters, also conveniently summarized in feature-frame tables - Later User and Tweets Lookups
: Later (January, February 2023) lookup of previously collected users and tweets they likes/retweeted - Likers Retweeters Pagination
: Later (January, February 2023) lookup of likers and retweeters using new pagination parameter
The following submission includes raw and processed data from the in water deployment of NREL's Hydraulic and Electric Reverse Osmosis Wave Energy Converter (HERO WEC), in the form of parquet files, TDMS files, CSV files, bag files and MATLAB workspaces. This dataset was collected in March 2024 at the Jennette's pier test site in North Carolina. This submission includes the following: Data description document (HERO WEC FY24 Hydraulic Deployment Data Descriptions.doc) - This document includes detailed descriptions of the type of data and how it was processed and/or calculated. Processed MATLAB workspace - The processed data is provided in the form of a single MATLAB workspace containing data from the full deployment. This workspace contains data from all sensors down sampled to 10 Hz along with all array Value Added Products (VAPs). MATLAB visualization scripts - The MATLAB workspaces can be visualized using the file "HERO_WEC_2024_Hydraulic_Config_Data_Viewer.m/mlx". The user simply needs to download the processed MATLAB workspaces, specify the desired start and end times and run this file. Both the .m and .mlx file format has been provided depending on the user's preference. Summary Data - The fully processed data was used to create a summary data set with averages and important calculations performed on 30-minute intervals to align with the intervals of wave resource data reported from nearby CDIP ocean observing buoys located 20km East of Jennette's pier and 40km Northeast of Jennette's pier. The wave resource data provided in this data set is to be used for reference only due the difference in water depth and proximity to shore between the Jennette's pier test site and the locations of the ocean observing buoys. This data is provided in the Summary Data zip folder, which includes this data set in the form of a MATLAB workspace, parquet file, and excel spreadsheet. Processed Parquet File - The processed data is provided in the form of a single parquet file containing data from all HERO WEC sensors collected during the full deployment. Data in these files has been down sampled to 10 Hz and all array VAPs are included. Interim Filtered Data - Raw data from each sensor group partitioned into 30-minute parquet files. These files are outputs from an intermediate stage of data processing and contain the raw data with no Quality Control (QC) or calculations performed in a format that is easier to use than the raw data. Raw Data - Raw, unprocessed data from this deployment can be found in the Raw Data zip folder. This data is provided in the form of TDMS, CSV, and bag files in the original format output by the MODAQ system. Python Data Processing Script - This links to an NREL public github repository containing the python script used to go from raw data to fully processed parquet files. Additional documentation on how to use this script is included in the github repository. This data set has been developed by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Water Power Technologies Office.
https://ottawa.ca/en/city-hall/get-know-your-city/open-data#open-data-licence-version-2-0https://ottawa.ca/en/city-hall/get-know-your-city/open-data#open-data-licence-version-2-0
Provides a summary of water quality results for raw, treated, and distribution groundwater for the City of Ottawa’s Munster Communal Well.
Accuracy: There are no known errors with this data report.
Update Frequency: Annually
Contact: Gwyn Norman
We provide data on ecological community responses to wildfire, collected three years post-fire, across three burn conditions (unburned, moderate severity and high severity) in the Eldorado National Forest, California. The data were collected with 19 sampling methods deployed across 27 sites (nine in each burn condition) used to estimate richness, body size, abundance and biomass density for 849 species (including 107 primary producers, 634 invertebrates, 94 vertebrates). The sampling methods are detailed in a companion data paper. To maximize transparency and ease of use we have made our data available in four formats: Raw, tidy, temporary and summary. Raw data is as close as possible to the form in which it was collected. As such, raw data is not ready for analysis. We have provided tidy versions of each raw data set that are ready for analysis. Temporary data files are included for transparency but are used to create summary data files, and not intended to be informative as stand alon...
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.
The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa
https://cloud.google.com/bigquery/public-data/us-census
Dataset Source: United States Census Bureau
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What are the ten most populous zip codes in the US in the 2010 census?
What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?
https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png">
https://cloud.google.com/bigquery/images/census-population-map.png
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Retail trade index operation (ICIm) provides a conjunctural indicator of the trade and personnel evolution of the sector, based on the total turnover and the personnel employed in a selection of commercial establishments in the Basque Country with representation in all three provinces
NA. This dataset is not publicly accessible because: The data used in this manuscript were obtained under Data Use Agreements with the NCS Vanguard Data and Sample Archive and Access System and the NICHD Data and Specimen Hub (DASH). Because of the requirements of the DUA, we are unable to provide raw data; thus, the summary data are provided that are included in the manuscript. It can be accessed through the following means: The manuscript contains tables of the summary statistics. For the original data, users must have an approved DUA with NICHD DASH. Format: Word file of tables with summary statistics for maternal blood Pb, urine Pb, Pb surface wipe loading and Pb vacuum bag dust. This dataset is associated with the following publication: Stanek, L., N. Grokhowsky, B. George, and K. Thomas. Assessing lead exposure in U.S. pregnant women using biological and residential measurements. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, (905): 167135, (2023).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Intro
Dataset from the publication "Lithium-ion battery degradation: comprehensive cycle ageing data and analysis for commercial 21700 cells", DOI: https://doi.org/10.1016/j.jpowsour.2024.234185
Full details of the study can be found in the publication, including thorough descriptions of the experimental methods and structure. A basic desciption of the experimental procedure and data structure is included here for ease of use.
Commercial 21700 cylindrical cells (LG M50T, LG GBM50T2170) were cycle aged under 3 different temperatures [10, 25, 40] °C and 4 different SoC ranges [0-30, 70-85, 85-100, 0-100]%, as well as a further [0-100]% SoC range experiment which utilised a drive-cycle discharge instead of constant-current. The same C-rates (0.3C / 1 C, for charge / discharge) were used in all tests; multiple cells were tested under each condition. These are listed in the table below.
Experiment
SOC Window
Cycles per ageing set
Current
Temperature
Number of Cells
1
0-30%
257
0.3C / 1D
10°C
3
25°C
3
40°C
3
2,2
70-85%
515
0.3C / 1D
10°C
2
25°C
2
40°C
2
3
85-100%
515
0.3C / 1D
10°C
3
25°C
3
40°C
3
4
0-100% (drive-cycle)
78
0.3C / noisy D
10°C
3
25°C
2
40°C
3
5
0-100%
78
0.3C / 1D
10°C
3
25°C
2
40°C
3
Cells were base-cooled at set temperatures using bespoke test rigs (see our linked publications for details; the supporting information file contains detailed descriptions and photographs). Cells were subject to break-in cycles prior to beginning of life (BoL) performance tests using the ‘Reference Performance Test’ (RPT) procedures. They were then alternately subject to ageing sets and RPTs until the end of testing. Full details of each of these procedures are described in the linked publication.
The data contained in this repository is then described in the Data section below. This includes a description of the folder structure and naming conventions, file formats, and data analysis methods used for the ‘Processed Data’ which has been calculated from the raw data.
An 'experimental_metadata' .xlsx file is included to aid parsing of data. A jupyter notebook has also been included to demonstate how to access some of the data.
Data
Data are organised according to their parent ‘Experiment’, as defined above, with a folder for each. Within each Experiment folder, there are 3 subfolders: ‘Summary Data’, ‘Processed Timeseries Data’, and ‘Raw Data’.
Summary Data
This folder contains data which has been extracted by processing the raw data in the ‘Degradation Cycling’ and ‘Performance Checks’ folders. In most cases, the data you are looking for will be stored here.
It contains:
Performance Summary
A summary file for each cell which details key ageing metrics such as number of ageing cycles, charge throughput, cell capacity, resistance, and degradation mode analysis results. Each row of data corresponds to a different SoH.
Degradation Mode Analysis (DMA) was also performed on the C/10 discharge data at each RPT. This analysis uses an optimisation function to determine the capacities and offset of the positive and negative electrodes by calculating a full cell voltage vs capacity curve using 1/2 cell data and comparing against the experimentally measured voltage vs capacity data from the C/10 discharge. See our ACS publication for more details.
Data includes:
· Ageing Set: numbered 0 (BoL) to x, where x is the number of ageing sets the cell has been subject to.
· Ageing Cycles: number of ageing cycles the cell has been subject to. *this is not equivalent full cycles.
· Ageing Set Start Date/ End date: The date that each ageing set began/ ended.
· Days of degradation: Number of days between the date of the first ageing set beginning and the current ageing set ending.
· Age set average temperature: average recorded surface temperature of the cell during cycle ageing. Temperature was recorded approximately 1/2 way up the length of the cell (i.e. between positive and negative caps).
· Charge throughput: total accumulated charge recorded during all cycles during ageing (i.e. sum of charge and discharge). This is the cumulative total since BoL (not including RPTs, and not including break-in cycles).
· Energy throughput: as with "charge throughput", but for energy.
· C/10 Capacity: the capacity recorded during the C/10 discharge test of each RPT.
· C/2 Capacity: the capacity recorded during the C/2 discharge test of each even-numbered RPT.
· 0.1s Resistance: The resistance calculated from the 25-pulse GITT test of each even-numbered RPT. This value is taken from the 12th pulse of the procedure (which corresponds to ~52% SoC at BoL). The resistance is calculated by dividing the voltage drop by the current at a timecale of 0.1 seconds after the current pulse is applied (the fastest timescale possible under the 10 Hz recording condition).
· Fitting parameters: output from the DMA optimisation function; 5 parameters which detail the upper/lower SoCs of each electrode, and the capacity fraction of graphite in the negative electrode.
· Capacity and offset data: calculated based on the fitting parameters above alongside the measured C/10 discharge capacity.
· DM data: Quantities of LLI, LAM-PE, LAM-NE, LAM-NE-Gr, and LAM-NE-Si calculated from the change in capacities/offset of each electrode since BoL.
· RMSE data: the root mean squared error of the optimisation function calculated from the residual between the measured and simulated voltage vs capacity profiles.
Ageing Sets Summary
Data from the ageing cycles, summarised on an average per cycle and an average per ageing set basis. Metrics include mean/ max/ min temperatures, voltages etc.
Processed Timeseries data
Timeseries data (voltage, current, temperature, etc.) from each subtest (pOCV, GITT, etc.) of the RPTs, all grouped by subtest-type and by cell ID.
Contains the same data as in the ‘Performance Checks’ subfolder of the 'Raw Data' folder, but has been processed to slice into relevant subtests from the RPT procedure and includes only limited variables (time, voltage, current, charge, temperature). These are all saved as .csv files. In general this data will be easier to access than the raw data, but perhaps not as rich.
Raw Data
These are the raw data from the performance checks and from the degradation cycles themselves. The data from here has already been processed by me to get values of ‘energy throughput’, ‘charge throughput’, ‘average ageing temperature’, etc., which are all saved in the ‘Summary Data’ folder as described in the relevant section above.
The data in the ‘Degradation Cycling’ folder are organised by ageing set (where an ageing set is a defined number of ageing cycles, as described in the paper). In theory, each cell should have one datafile in each ageing set subfolder. However, due to experimental issues, tests can sometimes be interrupted midway though, requiring the test to be subsequently resumed. In this case, there may be multiple datafiles for each cell in a given ageing set; during analysis, these should be concatenated according to the descriptor in the filename (e.g., ‘cycling7’ + ‘cycling7 (part 2)').
Similarly, the unprocessed raw data from the performance checks (i.e. RPTs) is stored in the 'Performance Checks' folder, and structured in the same way.
The raw data are saved in the .mpr format produced by the Biologic battery cycler. This is a binary format which is storage-efficient but can be more difficult to process for analysis purposes. We have therefore also exported the data into .txt files (called .mpt) for the performance checks (RPTs) which make analysis easier. However, the exported .mpt files could not be included for the degradation cycling files due to their larger size. If you require access these degradation cycle data, the .mpr binary file can be parsed using the Galvani package in python, or you can use Biologic’s (proprietary) BT-Lab software to export the data into .txt files.
File Naming Convention
The raw datafiles are named with a standard format. This is:
NDK - LG M50 deg - exp 1 - rig 1 - 10degC - cell A - RPT1_01_MB_CB1
{NDK - LG M50 deg} - {exp 1} – {rig 1} – {10degC} – {cell A} – {RPT1}_{01}_{MB}_{CB1}
{Standard prefix} – {experiment number} – {ID of test rig} – {control temperature} – {Cell ID} – {RPT number or aging cycle number}_{step number for the characterisation procedure (see above)}_{experimental technique name (will always be “MB”)}_{battery cycler channel ID used (always the same for a particular cell/experiment)}
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This research aims to develop a principle-based framework for audit analytics implementation, which addresses the challenges of AA implementation and acknowledges its socio-technical complexities and interdependencies among challenges. This research relies on mixed methods to capture the phenomena from the research’s participants through various approaches, i.e., MICMAC-ISM, case study, and interview with practitioners, with literature exploration as the starting point. The raw data collected consists of multimedia data (audio and video recordings of interviews and focused group discussion), which is then transformed into a text file (transcript), complemented with a softcopy of the documents from the case study object.
The published data in this dataset, consists of the summarized or analyzed data, as the raw data (including transcript) is not allowed to be published according to the decision by the Human Research Ethics Committee pertinent to this research (Approval #1979, 14 February 2022). This dataset's published data are text files representing the summarized/analyzed raw data as an online appendices to the thesis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.