17 datasets found
  1. d

    Cell type labels for all clustering and normalization combinations compared...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Hickey (2025). Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging [Dataset]. http://doi.org/10.5061/dryad.dfn2z352c
    Explore at:
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    John Hickey
    Time period covered
    Jan 1, 2021
    Description

    We performed CODEX (co-detection by indexing) multiplexed imaging on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), and single cell segmentation. Output of this process was a dataframe of nearly 130,000 cells with fluorescence values quantified from each marker. We used this dataframe as input to 1 of the 5 normalization techniques of which we compared z, double-log(z), min/max, and arcsinh normalizations to the original unmodified dataset. We used these normalized dataframes as inputs for 4 unsupervised clustering algorithms: k-means, leiden, X-shift euclidian, and X-shift angular.

    From the clustering outputs, we then labeled the clusters that resulted for cells observed in the data producing 20 u...

  2. e

    Dataset used in DeepIndices - Dataset - B2FIND

    • b2find.eudat.eu
    Updated May 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Dataset used in DeepIndices - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/27c84537-f9f8-5b44-8c7f-aa495891a884
    Explore at:
    Dataset updated
    May 26, 2021
    Description

    This dataset inclue multi-spectral acquisition of vegetation for the conception and evaluation of new DeepIndices over un-calibrated data. The images were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. The dataset were taken on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR) and in Dijon (Burgundy, France, at 47°18'32.5"N 5°04'01.8"E) within the site of AgroSup Dijon. Images of bean and corn, containing various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) were acquired in top-down view at 1.8 meter from the ground. Due to the nature of the camera, a spectral band registration is required and performed with a registration method based on previous work (with a sub-pixel registration accuracy). The alignment is refined in two steps, with (i) a rough estimation of the affine correction and (ii) a perspective correction for the refinement and accuracy through the detection and matching of key points. The result shows that GFTT algorithm is the best key-point detector considering the 570 nm band as spectral reference for the registration. After the registration, all spectral images are cropped to 1200* 800 px and concatenated to channel-wise. Spectral bands inherently have a high noise associated with the CCD sensor, which is a potential problem during normalization. To overcome this effect, 1% of the minimum and maximum signal is suppressed by calculating the quantiles, the signal is clipped on the given range and each band is rescaled in the interval [0,1] using min-max normalization

  3. m

    DAS

    • data.mendeley.com
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ang xingcheng (2025). DAS [Dataset]. http://doi.org/10.17632/fp77d4223z.1
    Explore at:
    Dataset updated
    Apr 25, 2025
    Authors
    ang xingcheng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset, procured via Distributed Acoustic Sensing (DAS) technology, comprises field-collected data from power cables buried at depths ranging from 1.2 to 2.0 meters. Its primary objective is to furnish multi-class time-series data to facilitate research in the identification of intrusion events affecting power cables. The dataset encompasses five distinct intrusion event categoptical fiberories: hammering (labeled as 0), background noise (labeled as 1), vehicle vibration (labeled as 2), rainfall (labeled as 3), and fan interference (labeled as 4). Data for each event type were acquired using a SHHZSB-100 DAS system, employing a sampling rate of 5 kHz. The resultant data consists of single-channel time-series signals, each comprising 2500 consecutive sampling points, and is stored in CSV format. Each row of data contains 2500 columns representing floating-point signal values, normalized to the [0,1] interval, with the final column denoting the integer event label. Data preprocessing involved the application of a sliding window segmentation technique (window length: 500 ms, overlap ratio: 40%) and Min-Max normalization to ensure both the continuity and scale consistency of the event segments. The dataset was stratified by event category and partitioned into training (80%), validation (10%), and testing (10%) sets. The testing set additionally incorporates composite noise, inclusive of power grid harmonics and random impulses, to validate the generalizability of the models. This dataset is particularly suited for the development of intrusion detection algorithms based on multi-modal feature fusion (e.g., GASF/RP image transformation combined with BiLSTM time-series modeling) or attention mechanisms. It is especially recommended for validating the performance of end-to-end classifiers, such as GRT-Transformer, within complex buried environments. Researchers may extract time-frequency domain features (mean, variance, wavelet packet energy) or directly input the raw signals for model training. Data acquisition is subject to the CC BY-NC 4.0 license and requires application to the corresponding author to safeguard sensitive information regarding the location of power facilities.

  4. Data from: OKG-ConvGRU: A domain knowledge-guided remote sensing prediction...

    • figshare.com
    7z
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Renhao Xiao (2025). OKG-ConvGRU: A domain knowledge-guided remote sensing prediction framework for ocean elements [Dataset]. http://doi.org/10.6084/m9.figshare.28814792.v5
    Explore at:
    7zAvailable download formats
    Dataset updated
    Apr 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Renhao Xiao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    1.The data folder storesThe data folder stores the long time-series remote sensing image data used in the experiment, which has been preprocessed. The study area is the eastern China Sea, and we selected the chlorophyll a concentration Chl-a as the target element for model prediction, and its influencing factors include sea surface temperature SST, particulate inorganic carbon PIC, particulate organic carbon POC, photosynthetically active radiation PAR, and normalized fluorescence line brightness NFLH.In this study, chlorophyll-a concentration (Chl-a) was selected as the target element for model prediction.Chl-a is influenced by sea surface temperature (SST), particulate inorganic carbon (PIC), particulate organic carbon (POC), photosynthetically active radiation (PAR), and normalized fluorescence line brightness (NFLH) (Zhaiet al. 2021). According to existing studies, phytoplankton growth is affected by multiple interactions of physical, chemical, and biological factors (Zhang et al. 2023; Menget al. 2022). Among these factors, SST showsa significant correlation with Chl-a concentration (Chen, Cai,et al. 2024), while interactions among POC, PIC, and Chl-a reflect the productivity and carbon cycling processes in marine ecosystems (Dong et al. 2025;Karmakaret al. 2024). In addition, PAR is strongly positively correlated with Chl-a (McGintyet al. 2016; Wang et al. 2020).The experimental data were obtained from satellite remote sensing images provided by NASA, spanning approximately 22 years from August 2002 to May 2024, with a monthlytemporal resolution. The data were derived from the MODIS L3 OceanColor product, available through anopen-access website (https://oceancolor.gsfc.nasa.gov/l3/), with a spatial resolution of 4 km.Data pre-processing:In this part, we performedseveral preprocessing operations on the original satellite images to improve the data qualityand make them better adapt to the subsequent spatio-temporal prediction. To address the issue of missing values in original images, the data interpolation empirical orthogonal function (DINEOF) method (Wang, Gao, and Liu2019; Beckers, Barth, and Alvera-Azcárate2006) was utilized to reconstruct the missing image data. This method effectively restores the missing values and retains the spatio-temporal variation characteristics of the data through spatio-temporal covariance matrix decomposition and iterative interpolation. Subsequently, high-precision land vector data corresponding to the selected projection was employed to implement a masking process for the land anomalies of the ocean water color data, thereby eliminating geographic interference. To unify the dimensions of the multi-source data, the parameters were normalized to the [0,1] interval by Min-Max normalization (Prasetyowatiet al. 2022). Finally, the images were uniformly cropped to 320×568 pixel specifications to fit the model inputs.The dataset division strictly followed the principle of temporal continuity, and the 262 months of data from August 2002 to May 2024 (2002.08-2024.05) were divided into three subsets: the training set (2002.08-2018.05, 90 months) is used for model parameter learning, the validation set (2018.06-2021.05, 36 months) is used for hyperparameter optimization, and the test set (2021.06-2024.05, 36 months) is used to evaluate the model generalization ability.2.The OKG folderThe OKG folder stores the source code of our constructed remote sensing spatio-temporal knowledge graph (OKG) of ocean elements as well as the semantic representation process, which contains the knowledge graph visualization, storage to Neo4j, and embedded models (TransE,TransH) training to evaluate the visualization process.3.The cross_convgru folderThe cross_convgru folder contains the source code of the developed model.4.Experimental environmentThe experiments are conducted on a workstation that is equipped with an Intel Core i7-14650HX processor and operates on the Windows 11 operating system. The model is implemented based on the PyTorch framework and utilizes an NVIDIA RTX 4070 graphics card (32GB video memory) for the purpose of training acceleration, with CUDA version 12.5.Code development and debugging are conducted in the PyCharm integrated development environment.5.The excel folderThe excel folder stores all the tabular data used in the thesis, which contains the values of the indicators obtained from the various experiments.6.The pictures folderThe pictures folder stores all the pictures presented in the manuscript of the paper, including module flowcharts, visualized knowledge graphs, predictions of the model, etc.

  5. e

    Flood regulation data - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Feb 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Flood regulation data - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/944ab21e-02e0-54fd-90c3-79d178d63743
    Explore at:
    Dataset updated
    Feb 8, 2025
    Description

    Data from Stürck, J., Poortinga, A., Verburg, P. H., 2014. Mapping ecosystem services: The supply and demand of flood regulation services in Europe. Ecol. Ind. 38, 198-211. Flood regulation supply is an ecosystem service, which, based on biophysical characteristics of the landscape, supports the mitigation of river floods. The effect of several biophysical factors on the indicator, i.e. land use, precipitation characteristics, type of the river catchment, water holding capacity of the soil, and location, were tested in experiments conducted in the hydrological model STREAM (Aerts, 1999). In each of the selected test catchments, it was tested how the variables affect the runoff volume at the catchment outlet after a rainfall event. The outcomes of the experiment were translated into a dimensionless index (0-1) and applied to Europe based on spatially explicit maps of the variables explored in the experiments, using a look-up table. For the demand indicator, downstream potential flood damages were quantified using the Damage Scanner model (DSM, Bubeck et al., 2011), which uses GDP, land cover, and inundation levels as a basis for estimation of direct flood damages. Flood damage was aggregated for each river catchment, and related to the available upstream area of the entire river basin. Thus, the higher the potential downstream damages, and the smaller the area to provide flood regulation upstream, the higher the demand for high flood regulation supply in the respective river catchment. Demands, expressed as an dimensionless indicator (0-1) are normalized across Europe using a min-max normalization.

  6. m

    EMG magnitude normalization

    • data.mendeley.com
    Updated Apr 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alireza aminaee (2020). EMG magnitude normalization [Dataset]. http://doi.org/10.17632/8kfytmbxbc.1
    Explore at:
    Dataset updated
    Apr 22, 2020
    Authors
    alireza aminaee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EMG data were normalized using Max-Min strategy. For comparison across all subjects, ʃIEMG values were normalized through following formula. the result of this equation ranged all the ʃIEMG values in to -1 to +1 ʃIEMGN = ʃIEMGi / ʃIEMGMAX

  7. m

    Five types of energy consumption in Korea

    • data.mendeley.com
    Updated Feb 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyuk-Yoon Kwon (2022). Five types of energy consumption in Korea [Dataset]. http://doi.org/10.17632/m68xz4w4t9.1
    Explore at:
    Dataset updated
    Feb 21, 2022
    Authors
    Hyuk-Yoon Kwon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Korea
    Description

    5-ECK consists of five types of energy sources: 1) electricity, 2) gas, 3) water, 4) hot water, and 5) heating. However, because gas was not collected for the full period, we used the remaining four data types. This dataset includes a total of 47,117 records observed from Jan. 2020 to Mar. 2021 with a 15-minutes time period. We transformed the time period of the energy consumption from 15 minutes to an hour consistently. Here, missing values are filled based on linear interpolation, and min-max scaling is used for normalization.

  8. f

    Binary classification using a confusion matrix.

    • plos.figshare.com
    xls
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chantha Wongoutong (2024). Binary classification using a confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0310839.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 6, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Chantha Wongoutong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Despite the popularity of k-means clustering, feature scaling before applying it can be an essential yet often neglected step. In this study, feature scaling via five methods: Z-score, Min-Max normalization, Percentile transformation, Maximum absolute scaling, or RobustScaler beforehand was compared with using the raw (i.e., non-scaled) data to analyze datasets having features with different or the same units via k-means clustering. The results of an experimental study show that, for features with different units, scaling them before k-means clustering provided better accuracy, precision, recall, and F-score values than when using the raw data. Meanwhile, when features in the dataset had the same unit, scaling them beforehand provided similar results to using the raw data. Thus, scaling the features beforehand is a very important step for datasets with different units, which improves the clustering results and accuracy. Of the five feature-scaling methods used in the dataset with different units, Z-score standardization and Percentile transformation provided similar performances that were superior to the other or using the raw data. While Maximum absolute scaling, slightly more performances than the other scaling methods and raw data when the dataset contains features with the same unit, the improvement was not significant.

  9. epidemic, factor misallocation and efficiency of digital enterprises:...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shujuan Wu; Minmin Li; Jianhua Xiao; Jianhua Tang (2024). epidemic, factor misallocation and efficiency of digital enterprises: heterogeneity study of labor and capital [Dataset]. http://doi.org/10.5061/dryad.ttdz08m4h
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    Wuyi University
    Authors
    Shujuan Wu; Minmin Li; Jianhua Xiao; Jianhua Tang
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    In order to improve the development quality of the digital economy, the factor misallocation shall be reduced effectively in the post-epidemic era. This study discussed the impact mechanism of how the labor and capital misallocation affect heterogeneously the epidemic’s effect on the efficiency of digital enterprises. The epidemic outbreak was considered a quasi-natural experiment to set a PSM-DID model and an intermediary effect model. Empirical research was carried out using the data of 1752 digital and non-digital high-tech enterprises from the fourth quarter of 2017 to the first quarter of 2022. The results show that: (1) The epidemic improved digital enterprises' efficiency, mainly from the positive effect on technological efficiency and scale efficiency. (2) The epidemic worsened the misallocation of labor but improved capital allocation. (3) Through the factor misallocation, the epidemic had a positive impact on the efficiency of digital enterprises, contributing to the worsening technical progress. Through the capital misallocation, the epidemic had a positive effect on the TFP of the digital enterprises but there is no significant evidence of its source. While through the labor misallocation had a positive effect on the TFP of the digital enterprises because of its positive effect on technological efficiency and pure efficiency. Methods The panel data of 1034 listed DENs and 718 listed non-digital high-tech enterprises from the fourth quarter of 2017 to the first quarter of 2022 were used, which were from the Guotai'an database and Shenzhen and Shanghai Stock Exchanges. Incomplete data, ST, *ST, suspension of listing and delisted enterprises data samples were excluded. The sample interval entertained the PSM-DID method's requirement. The min-max normalization method was used to standardize the data. The Stata 16.0 software was used for calculation.

  10. e

    Mobility Market Indicators and Macroeconomic Indicators for the MaaS Status...

    • b2find.eudat.eu
    Updated Nov 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Mobility Market Indicators and Macroeconomic Indicators for the MaaS Status Index (MSI) in Austria (2017-2028) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/28bd8b3c-c48d-5fdb-818b-c837b16eea13
    Explore at:
    Dataset updated
    Nov 12, 2024
    Area covered
    Austria
    Description

    Dataset description The datasets include mobility market indicators and macroeconomic indicators for Austria, which were used to calculate the Mobility as a Service (MaaS) Status Index (MSI). The MSI evaluates the readiness and potential for implementing Mobility as a Service (MaaS) in Austria. The datasets cover two distinct periods: 2017-2022 (T1) and 2023-2028 (T2). The indicators include annual revenues, vehicle costs, number of users, market shares, GDP per capita, urbanization rates, and investments in transportation infrastructure, among others. Context and methodology Each indicator is represented by the average annual growth rate, a mean value, and a normalized mean value (min-max-normalization) for period T1 and T2. The data were sourced from Statista (2024) Technical details The dataset contains two Microsoft Excel files (one for mobility market indicators, one for macroeconomic indicators). Other than Microsoft Excel, there is no additional software needed to investigate the data.

  11. m

    An Extensive Dataset for the Heart Disease Classification System

    • data.mendeley.com
    Updated Feb 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.1
    Explore at:
    Dataset updated
    Feb 15, 2022
    Authors
    Sozan S. Maghdid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.

  12. Data for A Systemic Framework for Assessing the Risk of Decarbonization to...

    • zenodo.org
    txt
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soheil Shayegh; Soheil Shayegh; Giorgia Coppola; Giorgia Coppola (2025). Data for A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union [Dataset]. http://doi.org/10.5281/zenodo.17152310
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Soheil Shayegh; Soheil Shayegh; Giorgia Coppola; Giorgia Coppola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 18, 2025
    Area covered
    European Union
    Description

    README — Code and data
    Project: LOCALISED

    Work Package 7, Task 7.1

    Paper: A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union

    What this repo does
    -------------------
    Builds the Transition‑Risk Index (TRI) for EU manufacturing at NUTS‑2 × NACE Rev.2, and reproduces the article’s Figures 3–6:
    • Exposure (emissions by region/sector)
    • Vulnerability (composite index)
    • Risk = Exposure ⊗ Vulnerability
    Outputs include intermediate tables, the final analysis dataset, and publication figures.

    Folder of interest
    ------------------
    Code and data/
    ├─ Code/ # R scripts (run in order 1A → 5)
    │ └─ Create Initial Data/ # scripts to (re)build Initial data/ from Eurostat API with imputation
    ├─ Initial data/ # Eurostat inputs imputed for missing values
    ├─ Derived data/ # intermediates
    ├─ Final data/ # final analysis-ready tables
    └─ Figures/ # exported figures

    Quick start
    -----------
    1) Open R (or RStudio) and set the working directory to “Code and data/Code”.
    Example: setwd(".../Code and data/Code")
    2) Initial data/ contains the required Eurostat inputs referenced by the scripts.
    To reproduce the inputs in Initial data/, run the scripts in Code/Create Initial Data/.
    These scripts download the required datasets from the respective API and impute missing values; outputs are written to ../Initial data/.
    3) Run scripts sequentially (they use relative paths to ../Raw data, ../Derived data, etc.):
    1A-non-sector-data.R → 1B-sector-data.R → 1C-all-data.R → 2-reshape-data.R → 3-normalize-data-by-n-enterpr.R → 4-risk-aggregation.R → 5A-results-maps.R, 5B-results-radar.R

    What each script does
    ---------------------
    Create Initial Data — Recreate inputs
    • Download source tables from the Eurostat API or the Localised DSP, apply light cleaning, and impute missing values.
    • Write the resulting inputs to Initial data/ for the analysis pipeline.

    1A / 1B / 1C — Build the unified base
    • Read individual Eurostat datasets (some sectoral, some only regional).
    • Harmonize, aggregate, and align them into a single analysis-ready schema.
    • Write aggregated outputs to Derived data/ (and/or Final data/ as needed).

    2 — Reshape and enrich
    • Reshapes the combined data and adds metadata.
    • Output: Derived data/2_All_data_long_READY.xlsx (all raw indicators in tidy long format, with indicator names and values).

    3 — Normalize (enterprises & min–max)
    • Divide selected indicators by number of enterprises.
    • Apply min–max normalization to [0.01, 0.99].
    • Exposure keeps real zeros (zeros remain zero).
    • Write normalized tables to Derived data/ or Final data/.

    4 — Aggregate indices
    • Vulnerability: build dimension scores (Energy, Labour, Finance, Supply Chain, Technology).
    – Within each dimension: equal‑weight mean of directionally aligned, [0.01,0.99]‑scaled indicators.
    – Dimension scores are re‑scaled to [0.01,0.99].
    • Aggregate Vulnerability: equal‑weight mean of the five dimensions.
    • TRI (Risk): combine Exposure (E) and Vulnerability (V) via a weighted geometric rule with α = 0.5 in the baseline.
    – Policy‑intuitive properties: high E & high V → high risk; imbalances penalized (non‑compensatory).
    • Output: Final data/ (main analysis tables).

    5A / 5B — Visualize results
    • 5A: maps and distribution plots for Exposure, Vulnerability, and Risk → Figures 3 & 4.
    • 5B: comparative/radar profiles for selected countries/regions/subsectors → Figures 5 & 6.
    • Outputs saved to Figures/.

    Data flow (at a glance)
    -----------------------
    Initial data → (1A–1C) Aggregated base → (2) Tidy long file → (3) Normalized indicators → (4) Composite indices → (5) Figures
    | | |
    v v v
    Derived data/ 2_All_data_long_READY.xlsx Final data/ & Figures/

    Assumptions & conventions
    -------------------------
    • Geography: EU NUTS‑2 regions; Sector: NACE Rev.2 manufacturing subsectors.
    • Equal weights by default where no evidence supports alternatives.
    • All indicators directionally aligned so that higher = greater transition difficulty.
    • Relative paths assume working directory = Code/.

    Reproducing the article
    -----------------------
    • Optionally run the codes from the Code/Create Initial Data subfolder
    • Run 1A → 5B without interruption to regenerate:
    – Figure 3: Exposure, Vulnerability, Risk maps (total manufacturing).
    – Figure 4: Vulnerability dimensions (Energy, Labour, Finance, Supply Chain, Technology).
    – Figure 5: Drivers of risk—highest vs. lowest risk regions (example: Germany & Greece).
    – Figure 6: Subsector case (e.g., basic metals) by selected regions.
    • Final tables for the paper live in Final data/. Figures export to Figures/.

    Requirements
    ------------
    • R (version per your environment).
    • Install any missing packages listed at the top of each script (e.g., install.packages("...")).

    Troubleshooting
    ---------------
    • “File not found”: check that the previous script finished and wrote its outputs to the expected folder.
    • Paths: confirm getwd() ends with /Code so relative paths resolve to ../Raw data, ../Derived data, etc.
    • Reruns: optionally clear Derived data/, Final data/, and Figures/ before a clean rebuild.

    Provenance & citation
    ---------------------
    • Inputs: Eurostat and related sources cited in the paper and headers of the scripts.
    • Methods: OECD composite‑indicator guidance; IPCC AR6 risk framing (see paper references).
    • If you use this code, please cite the article:
    A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union.

  13. f

    Definitions of the K-CHDI indicators.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dun-Sol Go; Young-Eun Kim; Seok-Jun Yoon (2023). Definitions of the K-CHDI indicators. [Dataset]. http://doi.org/10.1371/journal.pone.0240304.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dun-Sol Go; Young-Eun Kim; Seok-Jun Yoon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Definitions of the K-CHDI indicators.

  14. f

    Comparison between different combinations the mixed-precision weights...

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ninnart Fuengfusin; Hakaru Tamukoh (2023). Comparison between different combinations the mixed-precision weights network. [Dataset]. http://doi.org/10.1371/journal.pone.0251329.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ninnart Fuengfusin; Hakaru Tamukoh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ASB before denotes ASB without the min-max normalization and ASB denotes ASB with the min-max normalization.

  15. r

    MFCCs Feature Scaling Images for Multi-class Human Action Analysis : A...

    • researchdata.edu.au
    • data.mendeley.com
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naveed Akhtar; Syed Mohammed Shamsul Islam; Douglas Chai; Muhammad Bilal Shaikh; Computer Science and Software Engineering (2023). MFCCs Feature Scaling Images for Multi-class Human Action Analysis : A Benchmark Dataset [Dataset]. http://doi.org/10.17632/6D8V9JMVGM.1
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    The University of Western Australia
    Mendeley Data
    Authors
    Naveed Akhtar; Syed Mohammed Shamsul Islam; Douglas Chai; Muhammad Bilal Shaikh; Computer Science and Software Engineering
    Description

    his dataset comprises an array of Mel Frequency Cepstral Coefficients (MFCCs) that have undergone feature scaling, representing a variety of human actions. Feature scaling, or data normalization, is a preprocessing technique used to standardize the range of features in the dataset. For MFCCs, this process helps ensure all coefficients contribute equally to the learning process, preventing features with larger scales from overshadowing those with smaller scales.

    In this dataset, the audio signals correspond to diverse human actions such as walking, running, jumping, and dancing. The MFCCs are calculated via a series of signal processing stages, which capture key characteristics of the audio signal in a manner that closely aligns with human auditory perception. The coefficients are then standardized or scaled using methods such as MinMax Scaling or Standardization, thereby normalizing their range. Each normalized MFCC vector corresponds to a segment of the audio signal.

    The dataset is meticulously designed for tasks including human action recognition, classification, segmentation, and detection based on auditory cues. It serves as an essential resource for training and evaluating machine learning models focused on interpreting human actions from audio signals. This dataset proves particularly beneficial for researchers and practitioners in fields such as signal processing, computer vision, and machine learning, who aim to craft algorithms for human action analysis leveraging audio signals.

  16. O

    Dataset for Mid-Price Forecasting of Limit Order Book Data

    • opendatalab.com
    zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tampere University of Technology, Dataset for Mid-Price Forecasting of Limit Order Book Data [Dataset]. https://opendatalab.com/OpenDataLab/Dataset_for_Mid-Price_etc
    Explore at:
    zipAvailable download formats
    Dataset provided by
    Tampere University of Technology
    Aarhus University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here we provide the normalized datasets as .txt files. The datasets are divided into two main categories: datasets that include the auction period and datasets that do not. For each of these two categories we provide three normalization set-ups based on z-score, min-max, and decimal-precision normalization. Since we followed the anchored cross-validation method for 10 days for 5 stocks, the user can find nine (cross-fold) datasets for each normalization set-up for training and testing. Every training and testing dataset contains information for all the stocks. For example, the first fold contains one-day of training and one-day of testing for all the five stocks. The second fold contains the training dataset for two days and the testing dataset for one day. The two-days information the training dataset has is the training and testing from the first fold and so on. The title of the .txt files contains the information in the following order: training or testing set with or without auction period type of the normalization setup fold number (from 1 to 9) based on the above cross-validation method

  17. f

    Top-5 combinations from each BO search.

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ninnart Fuengfusin; Hakaru Tamukoh (2023). Top-5 combinations from each BO search. [Dataset]. http://doi.org/10.1371/journal.pone.0251329.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ninnart Fuengfusin; Hakaru Tamukoh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Iteration denotes the number of BO searches. Note that the a, s, and b are not normalized with the min-max normalization and Iteration starts with 0.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
John Hickey (2025). Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging [Dataset]. http://doi.org/10.5061/dryad.dfn2z352c

Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 28, 2025
Dataset provided by
Dryad Digital Repository
Authors
John Hickey
Time period covered
Jan 1, 2021
Description

We performed CODEX (co-detection by indexing) multiplexed imaging on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), and single cell segmentation. Output of this process was a dataframe of nearly 130,000 cells with fluorescence values quantified from each marker. We used this dataframe as input to 1 of the 5 normalization techniques of which we compared z, double-log(z), min/max, and arcsinh normalizations to the original unmodified dataset. We used these normalized dataframes as inputs for 4 unsupervised clustering algorithms: k-means, leiden, X-shift euclidian, and X-shift angular.

From the clustering outputs, we then labeled the clusters that resulted for cells observed in the data producing 20 u...

Search
Clear search
Close search
Google apps
Main menu