81 datasets found

f
Data from: proteiNorm – A User-Friendly Tool for Normalization and Analysis...
datasetcatalog.nlm.nih.gov
Updated Sep 30, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Byrd, Alicia K; Zafar, Maroof K; Graw, Stefan; Tang, Jillian; Byrum, Stephanie D; Peterson, Eric C.; Bolden, Chris (2020). proteiNorm – A User-Friendly Tool for Normalization and Analysis of TMT and Label-Free Protein Quantification [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000568582
Explore at:
Dataset updated
Sep 30, 2020
Authors
Byrd, Alicia K; Zafar, Maroof K; Graw, Stefan; Tang, Jillian; Byrum, Stephanie D; Peterson, Eric C.; Bolden, Chris
Description
The technological advances in mass spectrometry allow us to collect more comprehensive data with higher quality and increasing speed. With the rapidly increasing amount of data generated, the need for streamlining analyses becomes more apparent. Proteomics data is known to be often affected by systemic bias from unknown sources, and failing to adequately normalize the data can lead to erroneous conclusions. To allow researchers to easily evaluate and compare different normalization methods via a user-friendly interface, we have developed “proteiNorm”. The current implementation of proteiNorm accommodates preliminary filters on peptide and sample levels followed by an evaluation of several popular normalization methods and visualization of the missing value. The user then selects an adequate normalization method and one of the several imputation methods used for the subsequent comparison of different differential expression methods and estimation of statistical power. The application of proteiNorm and interpretation of its results are demonstrated on two tandem mass tag multiplex (TMT6plex and TMT10plex) and one label-free spike-in mass spectrometry example data set. The three data sets reveal how the normalization methods perform differently on different experimental designs and the need for evaluation of normalization methods for each mass spectrometry experiment. With proteiNorm, we provide a user-friendly tool to identify an adequate normalization method and to select an appropriate method for differential expression analysis.
Z
AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and...
data-staging.niaid.nih.gov
nde-dev.biothings.io
+1more
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Becerra-Tomé, Alberto; Rodríguez Miret, Jan; Rodríguez Ortega, Miguel; Marsol Torrent, Sergi; Lima-López, Salvador; Farré-Maduell, Eulàlia; Krallinger, Martin (2024). AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and Normalization [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14210424
Explore at:
Dataset updated
Nov 25, 2024
Dataset provided by
Barcelona Supercomputing Center
Authors
Becerra-Tomé, Alberto; Rodríguez Miret, Jan; Rodríguez Ortega, Miguel; Marsol Torrent, Sergi; Lima-López, Salvador; Farré-Maduell, Eulàlia; Krallinger, Martin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises a professions gazetteer generated with automatically extracted terminology from the Mesinesp2 corpus, a manually annotated corpus in which domain experts have labeled a set of scientific literature, clinical trials, and patent abstracts, as well as clinical case reports.

A silver gazetteer for mention classification and normalization is created combining the predictions of automatic Named Entity Recognition models and normalization using Entity Linking to three controlled vocabularies SNOMED CT, NCBI and ESCO. The sources are 265,025 different documents, where 249,538 correspond to MESINESP2 Corpora and 15,487 to clinical cases from open clinical journals. From them, 5,682,000 mentions are extracted and 4,909,966 (86.42%) are normalized to any of the ontologies: SNOMED CT (4,909,966) for diseases, symptoms, drugs, locations, occupations, procedures and species; ESCO (215,140) for occupations; and NCBI (1,469,256) for species.

The repository contains a .tsv file with the following columns:

filenameid: A unique identifier combining the file name and mention span within the text. This ensures each extracted mention is uniquely traceable. Example: biblio-1000005#239#256 refers to a mention spanning characters 239–256 in the file with the name biblio-1000005.

span: The specific text span (mention) extracted from the document, representing a term or phrase identified in the dataset. Example: centro oncológico.

source: The origin of the document, indicating the corpus from which the mention was extracted. Possible values: mesinesp2, clinical_cases.

filename: The name of the file from which the mention was extracted. Example: biblio-1000005.

mention_class: Categories or semantic tags assigned to the mention, describing its type or context in the text. Example: ['ENFERMEDAD', 'SINTOMA'].

codes_esco: The normalized ontology codes from the European Skills, Competences, Qualifications, and Occupations (ESCO) vocabulary for the identified mention (if applicable). This field may be empty if no ESCO mapping exists. Example: 30629002.

terms_esco: The human-readable terms from the ESCO ontology corresponding to the codes_esco. Example: ['responsable de recursos', 'director de recursos', 'directora de recursos'].

codes_ncbi: The normalized ontology codes from the NCBI Taxonomy vocabulary for species (if applicable). This field may be empty if no NCBI mapping exists.

terms_ncbi: The human-readable terms from the NCBI Taxonomy vocabulary corresponding to the codes_ncbi. Example: ['Lacandoniaceae', 'Pandanaceae R.Br., 1810', 'Pandanaceae', 'Familia'].

codes_sct: The normalized ontology codes from SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) vocabulary for diseases, symptoms, drugs, locations, occupations, procedures, and species (if applicable). Example: 22232009.

terms_sct: The human-readable terms from the SNOMED CT ontology corresponding to the codes_sct. Example: ['adjudicador de regulaciones del seguro nacional'].

sct_sem_tag: The semantic category tag assigned by SNOMED CT to describe the general classification of the mention. Example: environment.

Suggestion: If you load the dataset using python, it is recommended to read the columns containing lists as follows

import ast

df["mention_class"] = df["mention_class"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)

License

This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). This means you are free to:

Share: Copy and redistribute the material in any medium or format.

Adapt: Remix, transform, and build upon the material for any purpose, even commercially.

Attribution Requirement: Please credit the dataset creators appropriately, provide a link to the license, and indicate if changes were made.

Contact

If you have any questions or suggestions, please contact us at:

Martin Krallinger ()

Additional resources and corpora

If you are interested, you might want to check out these corpora and resources:

MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)

MEDDOPROF corpus

Codes Reference List (for MEDDOPROF-NORM)

Annotation Guidelines

Occupations Gazetteer
Arabic OCR Project Dataset
kaggle.com
zip
Updated Nov 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousef Gomaa (2025). Arabic OCR Project Dataset [Dataset]. https://www.kaggle.com/datasets/yousefgomaa43/arabic-ocr-project-dataset
Explore at:
zip(1873466285 bytes)Available download formats
Dataset updated
Nov 2, 2025
Authors
Yousef Gomaa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Summary

Arabic handwritten paragraph dataset to be used for text normalization and generation using conditional deep generative models, such as:

Conditional Variational Autoencoder (CVAE)

Conditional Generative Adversarial Network (cGAN) (any GAN variant such as Pix2Pix, CycleGAN, or StyleGAN2)

Transformer-based generator (e.g., Vision Transformer with autoregressive decoding or text-to-image transformer)

Data Example

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17351483%2Fe1f10b4e62e5186c26dbe1f6741e3bdc%2F43.jpg?generation=1761401307913748&alt=media" alt="43.jpg">

Usage Examples

1. Preprocessing & Data Analysis:

Dataset exploration and cleaning

Character/word-level segmentation and normalization

Data augmentation (e.g., rotation, distortion)

2. Model Implementation:

Model 1: Conditional Variational Autoencoder (CVAE)

Model 2: Conditional GAN (any variant such as Pix2Pix or StyleGAN2)

Model 3: Transformer-based handwriting generator

3. Evaluation:

Quantitative metrics:

FID (Fréchet Inception Distance)

SSIM (Structural Similarity Index)

PSNR (Peak Signal-to-Noise Ratio)

Qualitative comparison:

Visual quality and handwriting consistency

Accuracy in representing Arabic characters and diacritics
VGG-16 with batch normalization
kaggle.com
zip
Updated Dec 15, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PyTorch (2017). VGG-16 with batch normalization [Dataset]. https://www.kaggle.com/pytorch/vgg16bn
Explore at:
zip(514090274 bytes)Available download formats
Dataset updated
Dec 15, 2017
Dataset authored and provided by
PyTorch
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
VGG-16

Very Deep Convolutional Networks for Large-Scale Image Recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Authors: Karen Simonyan, Andrew Zisserman
https://arxiv.org/abs/1409.1556

VGG Architectures

https://imgur.com/uLXrKxe.jpg" alt="VGG Architecture">

What is a Pre-trained Model?

A pre-trained model has been previously trained on a dataset and contains the weights and biases that represent the features of whichever dataset it was trained on. Learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that you would be transferable your dataset.

Why use a Pre-trained Model?

Pre-trained models are beneficial to us for many reasons. By using a pre-trained model you are saving time. Someone else has already spent the time and compute resources to learn a lot of features and your model will likely benefit from it.
normalization
huggingface.co
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Russian National Corpus (2025). normalization [Dataset]. https://huggingface.co/datasets/ruscorpora/normalization
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
Национальный корпус русского языка
Authors
Russian National Corpus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
TL;DR: Text Normalization for Social Media Corpus

Dataset Description

This dataset contains examples of Russian-language texts from social networks with distorted spelling (typos, abbreviations, etc.) and their normalized versions in json format. A detailed spelling correction protocol is given in the TBA article. The dataset size is 1930 sentence pairs. In each pair, the sentences are tokenized by words, and the lengths of both sentences in the pair are equal. If a… See the full description on the dataset page: https://huggingface.co/datasets/ruscorpora/normalization.
Naturalistic Neuroimaging Database
openneuro.org
Updated Apr 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.3
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002837.v1.1.3
Dataset updated
Apr 20, 2021
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.

The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

v2.0 Changes

Overview

We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.

Normalization

Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:

# Generate a resting state (rs) timeseries (ts) # Install / load package to make fake fMRI ts # install.packages("neuRosim") library(neuRosim) # Generate a ts ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1) # 3dDetrend -normalize # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1" # Do for the full timeseries ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2)); # Do this again for a shorter version of the same timeseries ts.shorter.length <- length(ts.normalised.long)/4 ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2)); # By looking at the summaries, it can be seen that the median values become larger summary(ts.normalised.long) summary(ts.normalised.short) # Plot results for the long and short ts # Truncate the longer ts for plotting only ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length] # Give the plot a title title <- "3dDetrend -normalize for long (blue) and short (red) timeseries"; plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short))); # Add zero line lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey'); # 3dDetrend -normalize -polort 0 for long timeseries lines(ts.normalised.long.made.shorter, col='blue'); # 3dDetrend -normalize -polort 0 for short timeseries lines(ts.normalised.short, col='red');

Standardization/modernization

The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.

New afni_proc.py command line

The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

Effect on results

From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
Example of normalizing the word ‘aaaaaaannnnnndddd’ using the proposed...
plos.figshare.com
xls
Updated Mar 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zainab Mansur; Nazlia Omar; Sabrina Tiun; Eissa M. Alshari (2024). Example of normalizing the word ‘aaaaaaannnnnndddd’ using the proposed method and four other normalization methods. [Dataset]. http://doi.org/10.1371/journal.pone.0299652.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299652.t004
Dataset updated
Mar 21, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Zainab Mansur; Nazlia Omar; Sabrina Tiun; Eissa M. Alshari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example of normalizing the word ‘aaaaaaannnnnndddd’ using the proposed method and four other normalization methods.
h
2025-08-datacite-normalized-affiliation-string-distribution
huggingface.co
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Collaborative Metadata (COMET) (2025). 2025-08-datacite-normalized-affiliation-string-distribution [Dataset]. https://huggingface.co/datasets/cometadata/2025-08-datacite-normalized-affiliation-string-distribution
Explore at:
Dataset updated
Nov 29, 2025
Dataset authored and provided by
Collaborative Metadata (COMET)
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
DataCite Normalized Affiliation Distribution

Summary

normalized_distribution.json contains one JSON object per normalized affiliation string. It aggregates the total occurrence count, a ranked list of the raw affiliation strings that collapse into the normalized form, and the provider/client entities that asserted them. This dataset is derived from the August 2025 DataCite creator/contributor export.

Structure

{ "normalized": "example university"… See the full description on the dataset page: https://huggingface.co/datasets/cometadata/2025-08-datacite-normalized-affiliation-string-distribution.
h
2025-08-datacite-normalized-affiliation-dois
huggingface.co
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Collaborative Metadata (COMET) (2025). 2025-08-datacite-normalized-affiliation-dois [Dataset]. https://huggingface.co/datasets/cometadata/2025-08-datacite-normalized-affiliation-dois
Explore at:
Dataset updated
Nov 29, 2025
Dataset authored and provided by
Collaborative Metadata (COMET)
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Normalized Affiliation DOI Distribution

Summary

normalized_affiliation_doi_distribution.json lists every normalized affiliation string alongside the DOIs where it appears in the August 2025 DataCite data file. For each normalized token the file stores the occurrence count, the sorted DOI list (unique per token), and provider/client frequency summaries.

Structure

{ "normalized": "example university", "occurrences": 314, "dois": ["10.1234/abc"… See the full description on the dataset page: https://huggingface.co/datasets/cometadata/2025-08-datacite-normalized-affiliation-dois.
m
An Extensive Dataset for the Heart Disease Classification System
data.mendeley.com
Updated Feb 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.1
Explore at:
Unique identifier
https://doi.org/10.17632/65gxgy2nmg.1
Dataset updated
Feb 15, 2022
Authors
Sozan S. Maghdid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
VGG-13 with batch normalization
kaggle.com
zip
Updated Dec 15, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PyTorch (2017). VGG-13 with batch normalization [Dataset]. https://www.kaggle.com/pytorch/vgg13bn
Explore at:
zip(494374435 bytes)Available download formats
Dataset updated
Dec 15, 2017
Dataset authored and provided by
PyTorch
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
VGG-13

Very Deep Convolutional Networks for Large-Scale Image Recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Authors: Karen Simonyan, Andrew Zisserman
https://arxiv.org/abs/1409.1556

VGG Architectures

https://imgur.com/uLXrKxe.jpg" alt="VGG Architecture">

What is a Pre-trained Model?

A pre-trained model has been previously trained on a dataset and contains the weights and biases that represent the features of whichever dataset it was trained on. Learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that you would be transferable your dataset.

Why use a Pre-trained Model?

Pre-trained models are beneficial to us for many reasons. By using a pre-trained model you are saving time. Someone else has already spent the time and compute resources to learn a lot of features and your model will likely benefit from it.
Normalized GEFS Week 4 Forecasts and ERA5 VPD Labels for ML
zenodo.org
zip
Updated Sep 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Kumler; Christina Kumler (2025). Normalized GEFS Week 4 Forecasts and ERA5 VPD Labels for ML [Dataset]. http://doi.org/10.5281/zenodo.17086850
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17086850
Dataset updated
Sep 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christina Kumler; Christina Kumler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 22, 2025
Description
Machine Learning dataset from 2000-2019, specifically used to train UNET neural networks, containing the following training data processed to a CONUS-like domain, 10.5 to 59.5 latitude and -159.5 to -60.5 longitude, on half degree resolution from 11 ensemble members of 6-hourly GEFS data and vapor pressure deficit (VPD) labels created on the same domain from ERA5.

Training data are from years: 2000, 2001, 2003-2006, 2009-2012, 2016, 2017, and 2019

Validation data are from years: 2002, 2008, 2014, and 2018

Blind testing data are from years 2007, 2013, and 2015

The input data are created from week 4 forecasting data produced by the GEFS initialized on the first Wednesdays of the year. Input data included in this dataset are:

'soilw' # Weekly mean volumetric soil moisture content @ bgrnd_depth

'pwat' # Weekly mean precipitable water

'2t' # Weekly mean 2 metre temperature

't' # Weekly mean temperature at pressure (note two values at 850mb and 200mb)

sine of the number of the week of the year

Finally, the files are normalized by z-score normalization by pressure height (or surface) and variable. They are then saved into npy matrices sized [99, 199, 6] in the above order for NN training purposes. The VPD labels are the coresponding weekly mean VPD per gridcell derived form ERA5 data, and stored in npy files sized [99, 199, 1] for NN label purposes, intended to represent "observed" vpd on the corresponding week three forecast. They have an identical name to the input but are stored in the label directory. The zip files has been zipped in a way that contains subdirectories directories storing the npy files and identical data-label names as the following:

naming example - nn_dataset_YYYY_week_WW_ens_E_f_3.npy

where YYYY = year (2019)

where WW = week (1 through up to 48)

where E = GEFS ensemble number (0-10)

where f_3 means forecast week three (0-4 included in initial GEFS dataset)

Directories are named to divide npy files into:

training data

training labels

validation data

validation labels

testing data

testing labels

Lastly, an additional file called "norm_inference_vars" is concluded and contains the validation and testing input variable datasets standard deviations and means.
f
Data_Sheet_1_In Praise of Artifice Reloaded: Caution With Natural Image...
frontiersin.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marina Martinez-Garcia; Marcelo Bertalmío; Jesús Malo (2023). Data_Sheet_1_In Praise of Artifice Reloaded: Caution With Natural Image Databases in Modeling Vision.pdf [Dataset]. http://doi.org/10.3389/fnins.2019.00008.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnins.2019.00008.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Marina Martinez-Garcia; Marcelo Bertalmío; Jesús Malo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Subjective image quality databases are a major source of raw data on how the visual system works in naturalistic environments. These databases describe the sensitivity of many observers to a wide range of distortions of different nature and intensity seen on top of a variety of natural images. Data of this kind seems to open a number of possibilities for the vision scientist to check the models in realistic scenarios. However, while these natural databases are great benchmarks for models developed in some other way (e.g., by using the well-controlled artificial stimuli of traditional psychophysics), they should be carefully used when trying to fit vision models. Given the high dimensionality of the image space, it is very likely that some basic phenomena are under-represented in the database. Therefore, a model fitted on these large-scale natural databases will not reproduce these under-represented basic phenomena that could otherwise be easily illustrated with well selected artificial stimuli. In this work we study a specific example of the above statement. A standard cortical model using wavelets and divisive normalization tuned to reproduce subjective opinion on a large image quality dataset fails to reproduce basic cross-masking. Here we outline a solution for this problem by using artificial stimuli and by proposing a modification that makes the model easier to tune. Then, we show that the modified model is still competitive in the large-scale database. Our simulations with these artificial stimuli show that when using steerable wavelets, the conventional unit norm Gaussian kernels in divisive normalization should be multiplied by high-pass filters to reproduce basic trends in masking. Basic visual phenomena may be misrepresented in large natural image datasets but this can be solved with model-interpretable stimuli. This is an additional argument in praise of artifice in line with Rust and Movshon (2005).
h
Vietnamese-norm
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tran Khanh Thanh, Vietnamese-norm [Dataset]. https://huggingface.co/datasets/thanhkt/Vietnamese-norm
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Tran Khanh Thanh
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Việt Nam
Description
This dataset use Vinorm and Llama to normalize Vietnamese text For example:

33/4 -> ba mươi ba tháng tư 43 tỷ USD -> bốn mươi ba tỉ đô la Covid-19 -> covid mười chín lần thứ VI -> lần thứ sáu 33% -> ba mươi ba phần trăm U23 -> u hai mươi ba iPhone 14 -> iphone mười bốn năm 2023 -> năm hai không hai mươi ba
a
MNIST Database
academictorrents.com
bittorrent
Updated Oct 14, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher J.C. Burges and Yann LeCun and Corinna Cortes (2014). MNIST Database [Dataset]. https://academictorrents.com/details/ce990b28668abf16480b8b906640a6cd7e3b8b21
Explore at:
bittorrent(11594722)Available download formats
Dataset updated
Oct 14, 2014
Dataset authored and provided by
Christopher J.C. Burges and Yann LeCun and Corinna Cortes
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field. With some classification methods (particuarly template-based methods, such as SVM and K-nearest neighbors),
a
TrafficDensity v2 0
ct-ejscreen-v1-connecticut.hub.arcgis.com
Updated Aug 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Connecticut (2023). TrafficDensity v2 0 [Dataset]. https://ct-ejscreen-v1-connecticut.hub.arcgis.com/datasets/38216db04e884bf78429903c07e295bb
Explore at:
Dataset updated
Aug 2, 2023
Dataset authored and provided by
University of Connecticut
Area covered

Description
This processed data represents the estimated percentile level of traffic density. The data is from the 2020 Traffic Monitoring Annual Average Daily Traffic Report, CT Department of Transportation. The census block data was converted into census tract data by the mean of the census blocks within a tract comprising the data associated with each tract. From there the percentile and the rank were calculated. A percentile is a score indicating the value below which a given percentage of observations in a group of observations fall. It indicates the relative position of a particular value within a dataset. For example, the 20th percentile is the value below which 20% of the observations may be found. The rank refers to a process of arranging percentiles in descending order, starting from the highest percentile and ending with the lowest percentile. Once the percentiles are ranked, a normalization step is performed to rescale the rank values between 0 and 10. A rank value of 10 represents the highest percentile, while a rank value of 0 corresponds to the lowest percentile in the dataset. The normalized rank provides a relative assessment of the position of each percentile within the distribution, making it simpler to understand the relative magnitude of differences between percentiles. Normalization between 0 and 10 ensures that the rank values are standardized and uniformly distributed within the specified range. This normalization allows for easier interpretation and comparison of the rank values, as they are now on a consistent scale.For detailed methods, go to connecticut-environmental-justice.circa.uconn.edu.
a
Linguistic Isolation v2 0
ct-ejscreen-v1-connecticut.hub.arcgis.com
Updated Aug 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Connecticut (2023). Linguistic Isolation v2 0 [Dataset]. https://ct-ejscreen-v1-connecticut.hub.arcgis.com/items/9316086a83d243b5a60eb92ba99324e5
Explore at:
Dataset updated
Aug 2, 2023
Dataset authored and provided by
University of Connecticut
Area covered

Description
This indicator represents the tracts ranked by their percentile level of percentage of limited English-speaking population over five years of age. The data source is 2017-2021 American Community Survey, 5-year estimates. The percentile and the rank were calculated. A percentile is a score indicating the value below which a given percentage of observations in a group of observations fall. It indicates the relative position of a particular value within a dataset. For example, the 20th percentile is the value below which 20% of the observations may be found. The rank refers to a process of arranging percentiles in descending order, starting from the highest percentile and ending with the lowest percentile. Once the percentiles are ranked, a normalization step is performed to rescale the rank values between 0 and 10. A rank value of 10 represents the highest percentile, while a rank value of 0 corresponds to the lowest percentile in the dataset. The normalized rank provides a relative assessment of the position of each percentile within the distribution, making it simpler to understand the relative magnitude of differences between percentiles. Normalization between 0 and 10 ensures that the rank values are standardized and uniformly distributed within the specified range. This normalization allows for easier interpretation and comparison of the rank values, as they are now on a consistent scale. For detailed methods, go to connecticut-environmental-justice.circa.uconn.edu.
a
SuperfundSites v2 0
ct-ejscreen-v1-connecticut.hub.arcgis.com
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Connecticut (2023). SuperfundSites v2 0 [Dataset]. https://ct-ejscreen-v1-connecticut.hub.arcgis.com/datasets/d2c56754df4d42c88d5d18796fce66c0
Explore at:
Dataset updated
Aug 1, 2023
Dataset authored and provided by
University of Connecticut
Area covered

Description
This indicator represents the tracts ranked by their percentile proximity to sites proposed and listed on the National Priorities List (NPL). National Priorities List (NPL) sites, are unregulated, abandoned hazardous waste sites that the federal government is given jurisdiction over for remediation efforts.. The percentile and the rank were calculated. A percentile is a score indicating the value below which a given percentage of observations in a group of observations fall. It indicates the relative position of a particular value within a dataset. For example, the 20th percentile is the value below which 20% of the observations may be found. The rank refers to a process of arranging percentiles in descending order, starting from the highest percentile and ending with the lowest percentile. Once the percentiles are ranked, a normalization step is performed to rescale the rank values between 0 and 10. A rank value of 10 represents the highest percentile, while a rank value of 0 corresponds to the lowest percentile in the dataset. The normalized rank provides a relative assessment of the position of each percentile within the distribution, making it simpler to understand the relative magnitude of differences between percentiles. Normalization between 0 and 10 ensures that the rank values are standardized and uniformly distributed within the specified range. This normalization allows for easier interpretation and comparison of the rank values, as they are now on a consistent scale. For detailed methods, go to connecticut-environmental-justice.circa.uconn.edu.
f
Example data of fusion features and growth indicators after Z-Score...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shen, Yingming; Wu, Mengyao; Tian, Peng; Qian, Ye; Li, Zhaowen; Sun, Jihong; Zhao, Jiawei; Wang, Xinrui (2025). Example data of fusion features and growth indicators after Z-Score normalization. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002086355
Explore at:
Dataset updated
May 20, 2025
Authors
Shen, Yingming; Wu, Mengyao; Tian, Peng; Qian, Ye; Li, Zhaowen; Sun, Jihong; Zhao, Jiawei; Wang, Xinrui
Description
Example data of fusion features and growth indicators after Z-Score normalization.
First results of online compression of HEP data using Baler
data.europa.eu
unknown
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). First results of online compression of HEP data using Baler [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8325728?locale=fr
Explore at:
unknown(236451)Available download formats
Dataset updated
Jun 8, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Within the framework of the HiDA Trilateral Data Science Exchange Program, this internship project unveils preliminary findings on online compression using Baler. It involved the examination of various datasets of different sizes from the High Energy Physics (HEP) domain to evaluate compression performance. All datasets used are subsets of the jet data recorded by the CMS experiment at the LHC in 2012, released as open data under the Creative Commons CC0 waiver (see references). The data is modified (flattened, truncated, formatted, etc.) and packaged in a way that makes it easy for others to reproduce the results. The provided files in this page include comprehensive instructions for replicating the project's results, as well as datasets and outcomes. Below, you'll find a brief overview of the project's folder structure, categorized by the dataset utilized (small dataset/example CMS data/larger CMS data), the online/offline compression method, and resource utilization, particularly regarding GPU usage. Presentations summarizing this project's results can be found here: https://zenodo.org/record/8326707. Project's folders: Reproduction Instructions: This folder houses all files that offer detailed guidelines for replicating the project's presented results. These files serve as a reference for accessing relevant materials. GPU with Example CMS Data.zip: This directory contains all files related to offline compression of the approximately 100MB example CMS dataset provided by Baler. GPU resources were employed in the model training process. GPU with Larger CMS Data (1).zip: In this section, you'll find files associated with the compression of a larger CMS dataset, approximately 1.4GB in size. It includes results of offline compression and a split of the dataset into a 50/50 ratio for training and testing, with results provided for various epochs. GPU with Larger CMS Data (2): This folder holds the larger dataset, divided into two halves, with the first half's array values in one file and the second half's in another. Offline/Online on Small Dataset: Here, you'll find files related to both offline and online compression of a small dataset, roughly 100KB in size, extracted from the example CMS dataset provided by Baler. Modifications of Small Dataset: This section comprises variations of the small dataset, including both normalized and un-normalized datasets. Materials: This folder includes fundamental papers and summaries to enhance your understanding of the project. HiDA: Within this directory, you'll find a printed webpage from the HiDA program.

Facebook

Twitter

Click to copy link

Link copied

Cite

Byrd, Alicia K; Zafar, Maroof K; Graw, Stefan; Tang, Jillian; Byrum, Stephanie D; Peterson, Eric C.; Bolden, Chris (2020). proteiNorm – A User-Friendly Tool for Normalization and Analysis of TMT and Label-Free Protein Quantification [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000568582

Data from: proteiNorm – A User-Friendly Tool for Normalization and Analysis of TMT and Label-Free Protein Quantification

Explore at:

Dataset updated

Sep 30, 2020

Authors

Byrd, Alicia K; Zafar, Maroof K; Graw, Stefan; Tang, Jillian; Byrum, Stephanie D; Peterson, Eric C.; Bolden, Chris

Description

The technological advances in mass spectrometry allow us to collect more comprehensive data with higher quality and increasing speed. With the rapidly increasing amount of data generated, the need for streamlining analyses becomes more apparent. Proteomics data is known to be often affected by systemic bias from unknown sources, and failing to adequately normalize the data can lead to erroneous conclusions. To allow researchers to easily evaluate and compare different normalization methods via a user-friendly interface, we have developed “proteiNorm”. The current implementation of proteiNorm accommodates preliminary filters on peptide and sample levels followed by an evaluation of several popular normalization methods and visualization of the missing value. The user then selects an adequate normalization method and one of the several imputation methods used for the subsequent comparison of different differential expression methods and estimation of statistical power. The application of proteiNorm and interpretation of its results are demonstrated on two tandem mass tag multiplex (TMT6plex and TMT10plex) and one label-free spike-in mass spectrometry example data set. The three data sets reveal how the normalization methods perform differently on different experimental designs and the need for evaluation of normalization methods for each mass spectrometry experiment. With proteiNorm, we provide a user-friendly tool to identify an adequate normalization method and to select an appropriate method for differential expression analysis.

Clear search

Close search

Google apps

Main menu

Data from: proteiNorm – A User-Friendly Tool for Normalization and Analysis...

AI4PROFHEALTH - Automatic Silver Gazetteer for Named Entity Recognition and...

Arabic OCR Project Dataset

Summary

Data Example

Usage Examples

1. Preprocessing & Data Analysis:

2. Model Implementation:

3. Evaluation:

Quantitative metrics:

Qualitative comparison:

VGG-16 with batch normalization

VGG-16

Very Deep Convolutional Networks for Large-Scale Image Recognition

VGG Architectures

What is a Pre-trained Model?

Why use a Pre-trained Model?

normalization

Naturalistic Neuroimaging Database

Overview

v2.0 Changes

Example of normalizing the word ‘aaaaaaannnnnndddd’ using the proposed...

2025-08-datacite-normalized-affiliation-string-distribution

2025-08-datacite-normalized-affiliation-dois

An Extensive Dataset for the Heart Disease Classification System

VGG-13 with batch normalization

VGG-13

Very Deep Convolutional Networks for Large-Scale Image Recognition

VGG Architectures

What is a Pre-trained Model?

Why use a Pre-trained Model?

Normalized GEFS Week 4 Forecasts and ERA5 VPD Labels for ML

Data_Sheet_1_In Praise of Artifice Reloaded: Caution With Natural Image...

Vietnamese-norm

MNIST Database

TrafficDensity v2 0

Linguistic Isolation v2 0

SuperfundSites v2 0

Example data of fusion features and growth indicators after Z-Score...

First results of online compression of HEP data using Baler

Data from: proteiNorm – A User-Friendly Tool for Normalization and Analysis of TMT and Label-Free Protein QuantificationSee More Versions

Data from: proteiNorm – A User-Friendly Tool for Normalization and Analysis of TMT and Label-Free Protein Quantification