Articles
https://doi.org/10.1038/s41587-022-01302-5
1
Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen,
Copenhagen, Denmark. 2
Spatial Proteomics Group, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.
3
Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary. 4
Single-Cell Technologies Ltd.,
Szeged, Hungary. 5
Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany. 6
Center for Health Data Science,
University of Copenhagen, Copenhagen, Denmark. 7
Big Data Institute, Li-Ka Shing Centre for Health Information and Discovery, University of Oxford,
Oxford, UK. 8
Department of Pathology, Zealand University Hospital, Roskilde, Denmark. 9
Institute for Clinical Medicine, University of Copenhagen,
Copenhagen, Denmark. 10
Department of Dermatology and Allergy, Herlev and Gentofte Hospital, University of Copenhagen, Hellerup, Denmark.
11
Leo Foundation Skin Immunology Research Center, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
12
Protein Imaging Platform, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen,
Copenhagen, Denmark. 13
Protein Signaling Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences,
University of Copenhagen, Copenhagen, Denmark. 14
Department of Obstetrics and Gynecology/Section of Gynecologic Oncology, University of Chicago,
Chicago, IL, USA. 15
Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology,
Stockholm, Sweden. 16
Department of Bioengineering, Stanford University, Stanford, CA, USA. 17
Chan Zuckerberg Biohub, San Francisco, CA, USA.
18
Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland. 19
These authors contributed equally: Andreas Mund,
Fabian Coscia. ✉e-mail: andreas.mund@cpr.ku.dk; horvath.peter@brc.hu; mmann@biochem.mpg.de
M
odernmicroscopyʼsversatility,resolutionandmulti-modal
nature delivers increasingly detailed images of single-cell
heterogeneity and tissue organization1
. Currently, a predefined
subset of proteins is usually targeted, far short of the actual
complexity of the proteome. Taking advantage of substantially
increased sensitivity in technology based on mass spectrometry
(MS), we set out to enable the analysis of proteomes within their
native, subcellular context to explore their contribution to health and
disease. We combined sub-micron-resolution imaging, image analysis
for single-cell phenotyping based on artificial intelligence (AI)
and isolation with an ultra-sensitive proteomics workflow2
(Fig. 1).
Key challenges turned out to be the accurate definition of single-cell
boundaries and cell classes as well as the transfer of the automatically
defined features into proteomic samples, ready for analysis. To
this end, we introduce the software ‘BIAS’ (Biology Image Analysis
Software), which coordinates scanning and laser microdissection
(LMD) microscopes. This seamlessly combines data-rich imaging
of cell cultures or archived biobank tissues (formalin-fixed
and paraffin-embedded (FFPE)) with deep-learning-based cell
segmentation and machine-learning-based identification of cell
types and states. Cellular or subcellular objects of interest are selected
by the AI alone or after instruction before being subjected to automated
LMD and proteomic profiling. Data generated by DVP can be
mined to discover protein signatures providing molecular insights
into proteome variation at the phenotypic level while retaining complete
spatial information.
Results
Image-guided single-cell isolation for cell-type-resolved proteomics.
The microscopy-related aspects of the DVP workflow
build on high-resolution whole-slide imaging, machine learning
(ML) and deep learning (DL) for image analysis.
First, we used scanning microscopy to obtain high-resolution
whole-slide images and developed a software suite for integrative
image analysis termed ‘BIAS’ (Methods). BIAS processes multiple
two-dimensional (2D) and three-dimensional (3D) microscopy
image file formats, supporting major microscope vendors and
data formats. It combines image pre-processing, DL-based image
Deep Visual Proteomics defines single-cell
identity and heterogeneity
Andreas Mund   1,19 ✉, Fabian Coscia1,2,19
, András Kriston3,4
, Réka Hollandi3
, Ferenc Kovács3,4
,
Andreas-David Brunner5
, Ede Migh3
, Lisa Schweizer5
, Alberto Santos1,6,7
, Michael Bzorek8
,
Soraya Naimy8
, Lise Mette Rahbek-Gjerdrum   8,9
, Beatrice Dyring-Andersen1,10,11
, Jutta Bulkescher12
,
Claudia Lukas   12,13
, Mark Adam Eckert14
, Ernst Lengyel14
, Christian Gnann15
, Emma Lundberg   15,16,17
,
Peter Horvath3,4,18 ✉ and Matthias Mann   1,5 ✉
Despitetheavailabiltyofimaging-basedandmass-spectrometry-basedmethodsforspatialproteomics,akeychallengeremains
connecting images with single-cell-resolution protein abundance measurements. Here, we introduce Deep Visual Proteomics
(DVP), which combines artificial-intelligence-driven image analysis of cellular phenotypes with automated single-cell or
single-nucleus laser microdissection and ultra-high-sensitivity mass spectrometry. DVP links protein abundance to complex
cellular or subcellular phenotypes while preserving spatial context. By individually excising nuclei from cell culture, we classified
distinct cell states with proteomic profiles defined by known and uncharacterized proteins. In an archived primary melanoma
tissue, DVP identified spatially resolved proteome changes as normal melanocytes transition to fully invasive melanoma,
revealing pathways that change in a spatial manner as cancer progresses, such as mRNA splicing dysregulation in metastatic
vertical growth that coincides with reduced interferon signaling and antigen presentation. The ability of DVP to retain precise
spatial proteomic information in the tissue context has implications for the molecular profiling of clinical samples.
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology 1231
Articles NATuREBIoTEcHnology
segmentation, feature extraction and ML-based phenotype classification.
Building on a recent DL-based algorithm for cytoplasm
and nucleus segmentation3
, we undertook several optimizations
to implement pre-processing algorithms to maintain high-quality
images across large image datasets. DL methods require large training
datasets, which is a considerable challenge due to the limited
size of high-quality training data4
. To address this challenge, we
used nucleAIzer3
and applied project-specific image style transfer to
synthesize artificial microscopy images resembling real images. This
approach is inherently adaptable to different biological scenarios,
such as new cell and tissue types or staining techniques5
. We trained
a deep neural network with these synthetic images for specific segmentation
of the cellular compartment of interest (for example,
nucleus or cytoplasm; Fig. 2a). We benchmarked it against two leading
DL approaches—unet4nuclei6
and Cellpose7
—and a widely used
adaptive threshold-based and object-splitting-based method8
. Our
cell and nucleus segmentation algorithms of cell cultures and tissues
showed the highest accuracy (Fig. 2b, Extended Data Fig. 1a, Table 1
and Supplementary Table 1). Our current benchmarking results
are supported by a previous study3
where we performed an extensive
comparison to additional methods and software (for example,
ilastik9
, on a large heterogeneous microscopy image set). For interactive
cellular phenotype discovery, BIAS performs phenotypic feature
extraction, taking into account morphology and neighborhood
features based on supervised and unsupervised ML (Extended Data
Fig. 1b and Methods). Feature-based phenotypic classification is
readily combined with biomarker expression level from antibody
staining for precise cell classification. ML has previously been used
for image analysis and cell selection but not combined with unbiased
proteomics10
. Furthermore, we extended BIAS with a Python
interface; thus, data access and manipulation is also possible using
standard Python functions in a generic way, including the integration
of open-source packages and custom algorithms.
To physically extract the cellular features discovered with BIAS,
we developed an interface between scanning and LMD microscopes
(currently Zeiss PALM MicroBeam and Leica LMD6 and
LMD7) (Fig. 2c). BIAS transfers cell contours between the microscopes,
preserving full accuracy. LMD has a theoretical accuracy of
70 nm using a ×150 objective, but, in practice, we reached 200 nm.
After optimization, the LMD7 can autonomously excise 1,250
high-resolution contours per hour, equivalent to 50 to 100 cells per
sample (Methods). To prevent potential laser-induced damage to
cell membranes, we excise contours with an offset (Fig. 2c,d and
Supplementary Videos 1 and 2).
Current LMD methods preserve the spatial context but are
mostly limited to human-eye-observable phenotypes and require
manual selection of cells, often resulting in admixing of different
cell types, which constrains throughput and de novo discovery11
.
Archived patient
tissue samples
High-resolution microscopy
Deep Visual Proteomics
High-parametric images with
subcellular resolution
Image segmentation
using deep learning
Resource for researchers
and clinicians
Bioinformatic data analysis Ultra-high-sensitivity proteomics
tSNE2
Machine learning
algorithms are
trained to
predict cellular
phenotypes
Automated single-cell isolation
using laser microdissection
Intelligent
image-based
single-cell
isolation
tSNE 1
y3
b3
y1
y2
Fig. 1 | DVP concept and workflow. DVP combines high-resolution imaging, AI-guided image analysis for single-cell classification and isolation with
an ultra-sensitive proteomics workflow2
. DVP links data-rich imaging of cell culture or archived patient biobank tissues with deep-learning-based cell
segmentation and machine-learning-based identification of cell types and states. (Un)supervised AI-classified cellular or subcellular objects of interest
undergo automated LMD and MS-based proteomic profiling. Subsequent bioinformatics data analysis enables data mining to discover protein signatures,
providing molecular insights into proteome variation in health and disease states at the level of single cells. tSNE, t-distributed stochastic neighbor embedding.
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology1232
ArticlesNATuREBIoTEcHnology
a
Segmentation
Melanoma Cells Salivary gland
c
Cutting Final pulse
MelanomacellsU2OSnuclei
Scanning microscopes Laser microdissection
microscopes
0.2
0.4
0.6
0.2
0.4
0.6
0.8
b
d
5,085 protein groups
FOXJ1 pos.
FOXJ1 neg.
f
h
e
FOXJ1 neg. FOXJ1 pos.
MTHFD1
CYP2C19
RAB10
LRPPRC
RAB15
RAP1B
CRB2
PIH1D2
SMYD2
MKS1
CA5B
SNTN
SPAG6
KRT7
MSLN
CAPS
CFAP52
FOXJ1
CFAP53
0
2
4
6
8
–4 –2 0 2 4
Relative protein level (log2)
t-testPvalue(–log10)
PCP4
TRIP10
–50
–25
0
25
50
–50 –25 0 25 50
Dim1 (35.3%)
Dim2(14.4%)
FOXJ1
EpCAM
Class1
Class2
0.2
0.4
0.6
F1score
F1score
F1score
g
FOXJ1 pos.FOXJ1 neg.
SNTN
CFAP53
CFAP52
SPAG6
CAPS
FOXJ1
KRT7
MSLN
–1
0
1
Proteinlevel
(z-score)
*
* * * *
* Imputed
OriginalAnnotation
Artificially
augmented
training data
Image style
transfer learning
Artificial masks Training mask
R-CNN
Cell
segmentation
model
Nuclei
detection
M1
M2
M3
Our
Cell body
detection
Cell body
detection
Biological
image
analysis
software
Offset Touching Path optimization
Collection
IsolationPreparation
M1
M2
M3
Our
M1
M2
M3
Our
Fig. 2 | BIAS for integrative image analysis and automated LMD single-cell isolation. a, AI-driven nucleus and cytoplasm segmentation of
normal-appearing and cancer cells and tissue using BIAS. b, We benchmarked the accuracy of its segmentation approach using the F1 metric and
compared results to three additional methods—M1 is unet4nuclei6
, M2 is CellProfiler8
and M3 is Cellpose7
—while OUR refers to nucleAIzer3
. Bars show
mean F1 scores with s.e.m.; n = 10 independent images for melanoma tissue and (U2OS) cells, and n = 20 for salivary gland tissue. Visual representation
of the segmentation results: green areas correspond to true positive, blue to false positive and red to false negative. c, BIAS serves as the interface
between the scanning and an LMD microscope, allowing high-accuracy transfers of cell contours between the microscopes. Illustration of cutting
offset with respect to the object of interest and optimal path finding. d, Practical illustration of the functions in the upper panel. e, Immunofluorescence
staining of the human fallopian tube epithelium with FOXJ1 and EpCAM antibodies, detecting ciliated and epithelial cells, respectively. Left panel: Ciliated
(FOXJ1-positive) and secretory (FOXJ1-negative) cells. Right panel: Cell classification based on FOXJ1 intensity. Class 1 (FOXJ1-positive) and class 2
(FOXJ1-negative); magnification factor = ×387. f, PCA of FOXJ1-positive and FOXJ1-negative cell proteomes. g, Heat map of known protein markers
for secretory and ciliated cells. Protein levels are z-scored. Asterisks represent imputed data. The marker list was derived from the Human Protein
Atlas20
project and based on literature mining. h, Volcano plot of the pairwise proteomic comparison between FOXJ1-positive and FOXJ1-negative cells.
Cell-type-specific marker proteins are highlighted in green and turquoise, and black represents potential novel marker proteins. Significant enriched
cell-type-specific proteins are displayed above the black lines (two-sided t-test, FDR < 0.05, s0 = 0.1, n = 4 biological replicates).
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology 1233
Articles NATuREBIoTEcHnology
To explore the sensitivity, specificity and robustness of our DVP
workflow, we obtained normal human fallopian tube tissue and
separated ciliated from secretory cells—the two major cell types
of the fallopian tube epithelium12
—using the cell-lineage-specific
transcription factor FOXJ1, a master regulator of cilia function, and
measured their proteomes (Fig. 2e–h, Extended Data Fig. 1c–f and
Supplementary Table 2). We solely detected FOXJ1 (ciliated cells)
in FOXJ1-stained cells (Fig. 2e,g), along with more than 5,000 other
quantified proteins with excellent correlations of biological replicates
(Extended Data Fig. 1d,e). Bioinformatic analysis of differences
in protein abundance mirrored the biologic features of the
distinct cell types. (Fig. 2f–h and Extended Data Fig. 1c–f). This
was driven by known protein markers of ciliated cells and expanded
to proteins not yet functionally associated with these cell types. We
used the fallopian tube epithelium as an example to highlight the
importance of the combination of antibody-based tissue staining
and unbiased, quantitative proteomics. Such in vivo cell type comparisons
will allow the discovery of cell type and cell state markers
and provide unbiased information to understand disease states
at the global proteome level. Of note, high-grade serous ovarian
cancer originates in the fallopian tube epithelium, and our method
can now be applied to study the early onset of the disease without
admixing unrelated cell types13
.
DVP defines single-cell heterogeneity at the subcellular level. We
applied our workflow to an unperturbed cancer cell line to determine
if DVP can characterize functional heterogeneity between
ostensibly similar cells (fluorescent ubiquitination-based cell cycle
indicator (FUCCI) U2OS cells14
). After DL-based segmentation for
nuclei and cell membrane detection, we isolated 80–100 single cells
or 250–300 nuclei per phenotype (Figs. 2c,d and 3a,b). The analysis
of small numbers of cells by MS has been a longstanding goal, held
back by formidable analytical challenges in the transfer, processing
and analysis of minute samples15
, which we addressed in turn.
We processed samples using our recently developed workflow for
ultra-low sample input2,16
, which omits any sample transfer steps
and ensures de-crosslinking in very low volumes (Methods). We
found that samples could be analyzed directly from 384 wells without
any additional sample transfer or clean-up. For MS measurements,
we employed a data-independent acquisition method using
parallel accumulation–serial fragmentation with an additional ion
mobility dimension and optimal fragment (diaPASEF) ion recovery
on a newly developed mass spectrometer2,17
. Replicates of cell and
nucleus proteomes demonstrated high quantitative reproducibility
(Pearson r = 0.96), and proteomes of whole cells differed from those
of nuclei alone, as expected from subcellular proteomics experiments
based on biochemical separation18
(Extended Data Fig. 2a,b).
In the bioinformatic enrichment analysis, terms like plasma membrane,
mitochondrion, nucleosomes and transcription factor complexes
were highly significant (false discovery rate (FDR) < 10−5
)
(Fig. 3c).
To address if morphological differences between nuclei are also
reflected in their proteomes, we used an unsupervised phenotype
finder model to identify groups of morphologically distinct nuclei
Table 1 | Mean F1 scores of the compared segmentation methods on our samples
Sample Method
M1 M2 M3 OUR
U2OS cyto 0.0667* ± 0.0075 0.5994 ± 0.0262 0.7205 ± 0.0152 0.7336 ± 0.0218
Melanoma nuc 0.1126 ± 0.0151 0.4386 ± 0.0157 0.1801 ± 0.0504 0.5498 ± 0.0231
Melanoma cyto 0.0058* ± 0.0021 0.0549 ± 0.0083 0.4859 ± 0.0354 0.5536 ± 0.0625
Salivary gland nuc 0.0797 ± 0.0138 0.6488 ± 0.0430 0.0338 ± 0.0145 0.7684 ± 0.0316
Salivary gland cyto 0.0714* ± 0.0151 0.0793 ± 0.0167 0.3174 ± 0.0588 0.5051 ± 0.0586
Melanoma (pink) nuc 0.0682 ± 0.0183 0.2999 ± 0.0599 0.0364 ± 0.0238 0.5079 ± 0.0392
Melanoma (pink) cyto 0.0261* ± 0.0070 0.0865 ± 0.0213 0.2659 ± 0.0429 0.2839 ± 0.0229
Fallopian tube nuc 0.0006 ± 0.0009 0.3121 ± 0.0501 0.3160 ± 0.0631 0.4724 ± 0.0683
Fallopian tube cyto 0.0016* ± 0.0023 0.0671 ± 0.0208 0.4566 ± 0.0530 0.3455 ± 0.0473
The methods are as follows: M1 is unet4nuclei6
, M2 is CellProfiler8
, M3 is Cellpose7
and OUR refers to nucleAIzer3
(implemented in BIAS). High scores are highlighted in bold. Asterisks (*) mark that M1 is
intended for nucleus segmentation but was applied to segment cytoplasm. s.e.m. is displayed with ± after the mean F1 scores in each cell.
Fig. 3 | DVP defines single-cell heterogeneity at the subcellular level. a, Segmentation of whole cells and nuclei in BIAS of DNA (DAPI)-stained U2OS
cells. Scale bar, 20 μm b, Automated LMD of whole cells and nuclei into 384-well plates. Images show wells after collection. c, Relative protein levels
(x axis) of major cellular compartments between whole cell (n = 3 biological replicates) and nuclei (n = 3 biological replicates) specific proteomes. y axis
displays point density. d, Left: conceptual workflows of the phenotype finder model of BIAS for ML-based classification of cellular phenotypes. Right:
results of unsupervised ML-based classification of six distinct U2OS nuclei classes based on morphological features and DNA staining intensity. Colors
represent classes. Scale bar, 20 μm. e, Phenotypic features used by ML to define six distinct nuclei classes. Radar plots show z-scored relative levels of
morphological features (nuclear area, perimeter, solidity and form factor) and DNA staining intensity (total DAPI signal). f, Example images of nuclei
from the six classes identified by ML. Blue color shows DNA staining intensity, and red color shows EdU staining intensity to identify cells undergoing
replication. Represented nuclei are enlarged for visualization and do not reflect actual sizes. g, PCA of five interphase classes based on 3,653 protein
groups after data filtering. Replicates of classes (n = 3 biological replicates) are highlighted by ellipses with a 95% confidence interval. h, Enrichment
analysis of proteins regulated among the five nuclei classes. Significant proteins (515 ANOVA significant, FDR < 0.05, s0 = 0.1) were compared to the set
of unchanged proteins based on Gene Ontology Biological Process (GOBP), Reactome pathways as well as cell cycle and cancer annotations derived
from the Human Protein Atlas (HPA)20
. A Fisher’s exact test with a Benjamini–Hochberg FDR of 0.05 was used (Supplementary Table 3). i, Unsupervised
hierarchical clustering of all 515 ANOVA significant protein groups (Supplementary Table 4). Cell-cycle-regulated proteins reported by the HPA are shown
in the lower bar. Nuclei classes (n = 3 biological replicates) are shown in the row bar. C1–C4 show clusters upregulated in the different nucleus classes.
j, Network analysis of enriched pathways for protein clusters C1–C4. Pathway enrichment analysis was performed with the ClusterProfiler R package36
.
ER, endoplasmic reticulum; PC, principal component.
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology1234
ArticlesNATuREBIoTEcHnology
Cell cycle
regulated
2
3
4
5
6
Class
–2 0 2
Protein level (z-score)
Class2
Class6
Class3
Class4
Class1
Class5
a
c
–2 –1 0 1 2
Cytoskeleton
Mitochondrion
Plasma membrane
Ribosome
ER
Nucleoplasm
Nucleosome
Transcription factor complex
Spliceosome
Relative protein level
(log2)
Cells Nuclei
Nuclei enrichedWhole cells
384-well
d
Whole cell segmentation Nuclei segmentation
f g
Area
Perimeter
Form
DAPI (total)
Solidity
Area
rPerimeter
Form
DAPI (total)
Solidity
Area
Perimeter
Form
DAPI (total)
Solidity
Area
Perimeter
Form
DAPI (total)
Solidity
–1
0
1
–1
0
1
–1
0
1
–1
0
1
z-score
Area
Perimeter
Form
DAPI (total)
Solidity
–1
0
1
Class2 Class3 Class4 Class5 Class6
Area
Perimeter
Form
Solidity
DAPI (total)
0
–1
1
Class1
3,653 protein groups
Phenotypic single-cell selection
e
DNA
EdU
b
h
Enrichment
factor
Reactome pathway
Human Protein Atlas
Significantly
regulated
5153,653
Total
i j
Cell cycle regulated
Transcriptional regulation by small RNAs
Nucleoplasm
Non-homologous end-joining (NHEJ)
Cancer-related genes
Unwinding of DNA
Condensation of prophase chromosomes
Packaging of telomere ends
Cancer-related genes, FDA approved drug targets
Switching of origins to a post-replicative state
0 2 4 6
Enriched terms
–50
–25
0
25
–60 –30 0 30 60
Dim1 (23%)
Dim2(13.7%)
Class2Class3
Class4
Class5
Class6
515 significant protein groups (FDR < 0.05)
C1 C3C2 C4
Oxidoreductase activity
Microtubule binding
Transporter activity
Ion transmembrane transporter
RRM2B
P4HA2
HADHA
PHGDH
UGDH
LDHB
NNTP3H1
PRDX1
NDUFB3
HSDL2
PRDX5
HADHB
PGD
MDH2
DLAT
G6PD
FDXR
COX6C
PDIA3
GPHN
RRM2
DHCR7
IMPDH2
P4HB
KIF22
DNM2
MAST2
EZR
ARHGEF2
NUSAP1
KIF23
LRPPRC
KIF2C
RACGAP1
KIFC1
VDAC3
ATP5F1B
ATP6V1A
ATP5F1A
ATP1A1
ATP5F1C
VDAC2ANXA2
VDAC1ATP5PO
APOE
TOMM20
Catalytic activity, acting on RNA
Transcription regulator
snoRNA binding
RNA helicase activity
Helicase activity
NSUN2
METTL1
MTREX
DDX1
DIS3
DDX18
APEX1
SAMHD1
RNMT
DDX50
FTSJ3
DDX5 DHX15
NOP2
DDX54
DDX39A
DDX27
DHX9
DUS3L
MEPCE
POLR2K
POLR3A
POLR2H
DDX21
POLR1C
EXOSC10
POP7
DHX36
EXOSC4
POLR2A
DDX47
DDX52
THRAP3
TP53BP1
HLTF
GTF3C1
SMARCA2
ADNP
TCERG1
CEBPZ
SFPQ
PSPC1
MYBBP1A
TERF2
GTF2B
PARP1
NONO
NFIB
GTF3C5
SSRP1
CCAR2RRP1B
FUBP1SIN3A
IRF2BP1 KDM2A
DRAP1
NPM1
NOC4L
GTF2E2
WDHD1
GTF2H4
NOP58
NOP14
SNU13
UTP6
GAR1
RRP9
DKC1
BMS1
RFC2
XRCC5
CHD4
CHTF18XRCC6
DNA replication origin binding
Double-stranded dna binding
Protein heterodimerization activity
Single–stranded DNA binding
MCM2MCM6
ORC1
MCM5 MCM4
MCM3
MCM7
NRF1
UBTF
IRF2BPL
H1–1
H3–3B
NR2C2
H1–10
H1–2
H1–0
TIMELESS
SMC4
HIP1R
SMC2
H4C8
H2AC21
H2AX
H2BU1
H2BC12
PRIM1
Structural moleculeCadherin binding
Cell adhesion molecule binding
Actin binding
Structural constituent of ribosome
VIM
CTNNA1
RPLP0
MYL6KRT18
SPTBN1
RPS25
AHNAK
MAP1B
TLN1
RPS6
CLTC
RPS11
RPS8RPS3
TUBA1A
RPS9 RPS7
PLEC
TUBB
SPTAN1
RPS4X
TPM2
ARPC4
MYH9
EEF1D EIF4G2
PAICS
IQGAP1
SERBP1
ITGB1
SEPTIN2
FASN
CDH2
KLC2
SEPTIN9
RAB11B
FLNA
DDX3X
NECTIN2
ACTN4
PALLD
CAP1
PDLIM4
MYO1C
C1
C2
C3
C4
Clustering
Supervised gating
PC1
PC1
PC2PC2
Classification
Feature 1
95%
87%
91%
88%
Cross
validation
Feature2
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology 1235
Articles NATuREBIoTEcHnology
based on nuclear area, perimeter, form factor, solidity and DNA
staining intensity (Fig. 3d). ML found three primary nuclei classes
(27–37% each) and also identified three rare ones (2–4% each)
(Extended Data Fig. 2c). The resulting six distinct nuclei classes
had visible differences in size and shape, with class 1 representing
mitotic states and the remaining five classes representing interphase
with varying feature weighting (Fig. 3e,f). We focused on those five
nuclei classes of unknown origin for subsequent analysis. In principal
component analysis (PCA), replicates of the respective proteomes
clustered closely, and the more frequent classes (2, 3 and 5)
grouped together (Fig. 3g). To verify and quantify this observation,
we compared each cell class proteome to a proteome of all ‘mixed’
nuclei in a field of view. This revealed that the rarest cell classes had
the highest numbers of differentially expressed proteins compared
to unclassified ‘bulk’ proteomes (Extended Data Fig. 2d,e). We next
asked if the proteomic differences across the five nuclei classes suggested
any functional differences among the interphase states (Fig.
3d,f). The 515 significantly differentially expressed proteins across
classes were enriched for nuclear and cell-cycle-related proteins (for
example, ‘switching of origins to a post-replicative state’ and ‘condensation
of prophase chromosomes’), suggesting the cell cycle as a
functional driver of separation (Fig. 3h–j, Extended Data Fig. 2f and
Supplementary Tables 3 and 4). Comparing our data to a single-cell
imaging dataset of cell-cycle-regulated proteins19
, we found significant
enrichment in our regulated proteins (FDR < 10−6
). Nuclear
area, one of the driving features among the different classes identified,
increased during interphase from G1 to S/G2 cells (Fig. 3e and
Extended Data Fig. 3a–c), further supporting the importance of the
cell cycle in defining the nuclei classes.
Our single-cell-type proteomes discovered several uncharacterized
proteins, presenting an opportunity to associate them
with a potential cellular function. Focusing on C11orf98, C7orf50,
C1orf112 and C19orf53, which remained after data filtering
(ANOVA P <0.05), showed class-specific expression patterns
(Extended Data Fig. 3d). C7orf50 was most highly expressed in the
nucleoli of classes 2, 4 and 3 nuclei, which showed S/G2-specific
characteristics (Fig. 3f and Extended Data Fig. 3d,e), suggesting
that its expression is cell cycle regulated. Indeed, we confirmed
higher levels of C7orf50 in G1/S and S/G2 compared to G1 phase
cells (Extended Data Fig. 3e). As cell-cycle-regulated proteins may
be associated with cancer prognosis19
, we investigated C7orf50 in
the human pathology atlas20
where high expression was associated
with favorable outcomes in pancreatic cancer (Extended Data
Fig. 3g; P < 0.001). Bioinformatic analysis revealed interaction,
co-expression and co-localization with the protein LYAR (‘cell
growth-regulating nucleolar protein’), suggesting a functional link
to cell proliferation (Extended Data Fig. 3f,h).
Class 6 showed an intriguing proteomic signature independent of
known cell cycle markers (Fig. 3i,j). These rare, bean-shaped nuclei
showed upregulation of specific cytoskeletal and cell adhesion proteins
(for example, VIM, TUBB, ACTB and ITGB1), suggesting that
these signatures derived from migrating cells undergoing nuclear
deformation, suggestive of cellular invasion21,22
. Note that we classified
nuclei from 2D images, but LMD isolates them in 3D. Thus,
samples also probe morphology-driven protein re-localization
around the nucleus as exemplified by class 6 nuclei. Likewise, excising
the nuclei captures the trafficking of proteins to and from the
cytosol to some degree.
These cell culture experiments establish that DVP correlates cellular
phenotypes, heterogeneity and dynamics with the proteome
level in an unbiased way for common and rare phenotypes.
DVP applied to cancer tissue heterogeneity. Billions of patient
samples are collected routinely during diagnostic workup and
stored in the archives of pathology departments around the world23
.
The precise proteomic characterization of single cells in their spatial
and subcellular context from tissue slides could have a tremendous
clinical effect, complementing the emerging field of digital pathol-
ogy24
. We selected archived paraffin-embedded tissue of a salivary
gland acinic cell carcinoma, a rare and understudied malignancy
of epithelial secretory cells of the salivary gland. We developed an
immunohistochemical (IHC) staining protocol on glass membrane
slides for LMD and stained the tissue for EpCAM to outline the cellular
boundaries for segmentation and feature extraction by BIAS
(Methods). These histologically normal-appearing regions were
mainly comprised of acinar, ductal and myoepithelial cells, whereas
the carcinoma component had predominatly uniform tumor cells
with round nuclei and abundant basophilic cytoplasm (Fig. 4a,b).
To identify disease-specific protein signatures, we aimed to
compare the histologically normal-appearing acinar cells with the
malignant cells rather than admixing with varying proportions of
unrelated cells. To this end, we classified acinar and duct cells from
normal parotid gland tissue based on their cell-type-specific morphological
features and isolated single-cell classes for proteomic
analysis (Fig. 4c and Extended Data Fig. 4a). Bioinformatics analysis
of the measured proteome differences revealed significant biological
differences between these neighboring cell types, reflecting
their distinct physiological functions. Acinar cells, which produce
and secrete saliva in secretory granules, showed high expression of
proteins related to vesicle transport and glycosylation along with
known acinar cell markers such as α-amylase (AMY1A), CA6 and
PIP (Extended Data Fig. 4b). In contrast, ductal cells expressed high
levels of mitochondria and metabolism-related proteins required
to meet the high energy demand for saliva secretion25
(Extended
Data Fig. 4c and Supplementary Table 5). For comparison, we
exclusively excised malignant and benign acinar cells from the various
regions within the same tissue section. The proteomes of acinar
cells clustered together regardless of disease state, indicating a
strong cell-of-origin signature (Extended Data Fig. 4d). Analyzing
six normal-appearing replicates and nine neoplastic regions showed
excellent within-group proteome correlation (Pearson r > 0.96).
The lower correlation of normal cells and cancer cells reflected
disease-specific and cell-type-specific proteome changes (Pearson
r = 0.8; Fig. 4d,e and Supplementary Table 6). Acinar cell markers
in the carcinoma were significantly downregulated, consistent
with previous reports25
. DVP allowed us to discover upregulation
of interferon response proteins (for example, MX1 and HLA-A;
Supplementary Table 6) and the proto-oncogene SRC, both
Fig. 4 | DVP applied to archived tissue of a rare salivary gland carcinoma. a, IHC staining of an acinic cell carcinoma of the salivary gland using the cell
adhesion protein EpCAM. b, Representative regions from normal-appearing tissue (upper panels I and II) and acinic cell carcinoma (lower panels III
and IV) from a. c, DVP workflow applied to the acinic cell carcinoma tissue. DL-based single cell detection of normal-appearing (green) and neoplastic
(magenta) cells positive for EpCAM. Cell classification based on phenotypic features (form factor, area, solidity, perimeter and EpCAM intensity).
d, Proteome correlations of replicates from normal-appearing (normal, n = 6) or cancer regions (cancer, n = 9). e, Volcano plot of pairwise proteomic
comparison between normal and cancer tissue. t-test significant proteins (two-sided t-test, FDR < 0.05, s0 = 0.1, n = 6 biological replicates for normal and
n = 9 for cancer) are highlighted by black lines. Proteins more highly expressed in normal tissue are highlighted in green on the volcanoʼs left, including
known acinic cell markers (AMY1A, CA6 and PIP). Proteins more highly expressed in the acinic cell carcinoma are on the right in magenta, including the
proto-oncogene SRC and interferon response proteins (MX1 and HLA-A; Supplementary Table 6). f, IHC validation of proteomic results. CNN1, SRC, CK5
and FASN are significantly enriched in normal or cancer tissue. Scale bar, 100 μm.
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology1236
ArticlesNATuREBIoTEcHnology
actionable therapeutic targets26
(Fig. 4e). We validated the proteomic
findings using IHC analysis of significantly enriched proteins in
either normal-appearing or cancererous tissue. This resulted in the
selection of CNN1, SRC, CK5 and FASN (Fig. 4f), which confirmed
our proteomic results, demonstrated the absence of contamination
and supported the specificity of our DVP approach.
Decoding the molecular alterations in melanoma development
and progression is key to identifying therapeutic vulnerabilities in
a
d
e
0.80
0.85
0.90
0.95
1.00
Pearson r
Normal Carcinoma
NormalCarcinoma
1 2
Acinar
cells
Acinic cell
carcinoma
×35.74
Normal-appearing acinar cells Acinic cell carcinoma
t-testPvalue(–log10)
c
NormalappearingCarcinoma
×2,031×783.2
b
I II
III IV
Carcinoma
Relative protein level (log2)
SRC II
IVIII
I
FASNCNN1
CK5
Carcinoma
Normal appearing
f
EPPK1
PDLIM2
COL15A1
RBP1
ZG16B HSPA12A
HLA–A
PHGDH
ALDH1L2
KRT14
CK5
KRT17
CALD1
GPHN
LAMB1
ACOT9
TPD52L1
PPP1R1B
SPTBN2
AGFG2
HEL–S–117
CA2
ASS1
STATH
SMR3B
LDHB
GUSB
C4A
GAA
PYGM
SRC
FABP4 HTN3
GLB1
CKMT2
MX1
GPD1
LPO
GPT LAMA1
MARCKS
MYH11
MAP1B
FASN
HNMT
LUM
NAGLU
CSE1L
LYZ
LRP2
FABP5
TAGLN
AOX1
LGALS3BP
SELENBP1
DHRS2
NID2
GALE
MVP
ERMP1
NR4A3
ATP2A3FAM3D
BPIFA2
TUBB6
FUT8
SCPEP1
MYOF
LMCD1 CTSZ
DMBT1
NDRG2
CRYL1
DHRS7
TSC22D4PSAT1
CPQ
AMY1A
PIP
CA6
CNN1
0
5
10
15
–5 0 5
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology 1237
Articles NATuREBIoTEcHnology
9
k
10
k
11
k
12
k
13
k
14
k
15
k
16
k
17
k
28.5 k
28 k
27.5 k
27 k
0
0.2
0.4
0.6
0.8
1
x coordinate (image)
ycoordinate(image)
Radial
Vertical
In situIn situ
CD146-high CD146-low
Oxidative phosphorylation (hallmark, 151 proteins)
Melanocytes
In situ
CD146-high
Stroma
Vertical
Radial
CD146-low
In situ
Melanocytes
×99.45
b
1 patient
a
5 cell classes
7 regions
27 samples
Facts
throughput
>30.000 contours/day
dimension
50–100 cells/sample
Melanocytes (mel) In situ Radial Vertical CD146-high CD146-low Stroma
0
10,000
C
lass1C
lass2C
lass3C
lass4C
lass5
20,000
30,000
40,000
Count
0.8
0.9
1.0
Pearson r
StromaMelMelanoma In situ
StromaMelMelanomaInsitu
Radial
Vertical
CD146-high
Melanoma cells
–20
0
20
–40 –20 0 20 40
Dim1 (19.9%)
Dim2(10.8%)
CD146-low
In situ
e
–40
–20
0
20
–40 –20 0 20
Dim1 (27%)
Dim2(12%)
Melanocytes
In situ
Stroma
Melanoma
All samples
i
j
k
c
d
f g h
9
k
10
k
11
k
12
k
13
k
14
k
15
k
16
k
17
k
28.5 k
28 k
27.5 k
27 k
–1
–0.5
0
0.5
1
Antigen processing and presentation (KEGG, 25 proteins)
9
k
10
k
11
k
12
k
13
k
14
k
15
k
16
k
17
k28.5 k
28 k
27.5 k
27 k
–0.8
–0.6
–0.4
–0.2
0
0.2
0.4
0.6
Interferon gamma response (hallmark, 53 proteins)
Pathwaylevel
(z-score)
Cancer Pre-cancer
Normal
Pre-cancer
Normal
mRNA splicing – major pathway
0
0.2
0.4
0.6
0 1,000 2,000 3,000
Degradation of the extracellular matrix
0
0.2
0.4
0.6
0 1,000 2,000 3,000
Position in the ranked list of genes
Enrichment
score
Interferon alpha/beta signaling
–0.6
–0.4
–0.2
0
0 1,000 2,000 3,000
MHC class II antigen presentation
–0.4
–0.2
0
0 1,000 2,000 3,000
P = 0.0019 P = 0.0018
P = 0.002 P = 0.0019
Decreased in vertical growth
Increased in vertical growth
–1.0
–0.5
0
0.5
1.0
Proteinlevel
(z–score)
–1.0
–0.5
0
0.5
1.0
–1.0
–0.5
0
0.5
1.0
l
Isolate melanoma cells from
different vertical regions and
same predicted ML class
Compare proteomes
(region 1 vs 2)
Analyze enriched
pathways
Vertical growth
Blood vessels
Region 1
Region 2
CD146-high
Vertical growth
Radial growth
CD146-low
In situ
Stroma
Melanocytes
Protein level
(z-score)
2–2 0
A B
Strom
a
M
elanocytes
In
situ
C
D
146-low
R
adialVertical
C
D
146-high
Strom
a
M
elanocytes
In
situ
C
D
146-low
R
adialVertical
C
D
146-high
Strom
a
M
elanocytes
In
situ
C
D
146-low
R
adialVertical
C
D
146-high
01
02
04
03
05
CD146
SOX10
HE
CD146
SOX10
HE
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology1238
ArticlesNATuREBIoTEcHnology
this highly metastatic disease. With pathogenic mutations in melanoma
largely catalogued27–29
, we set out to directly study spatially
resolved proteomes of distinct cellular phenotypes of melanoma
progression (Fig. 5a,b and Extended Data Fig. 5a,b). We co-stained
FFPE-embedded primary tumor material preserved for 17 years
with two markers, SOX10 and CD146, to map melanoma cells. As
overexpression of CD146 is implicated in melanoma progression,
and immunotherapy against CD146 targets metastasis30
, we used
CD146 as a disease progression marker in our analysis. ML predicted
five classes with clearly defined spatial distribution: class 1,
melanoma in situ; class 2, predominantly tumor; class 3, cells of the
tumor microenvironment; class 4, enriched in CD146-high regions;
and class 5, enriched in CD146-low regions. We used high-content
imaging to determine the required number of cells to identify statistically
and analytically robust cellular phenotypes for precise cell
type and state isolation within a spatial region. For this reason, we
typically collected around 100 cells per sample (Methods). Including
replicates, we isolated and profiled 27 different samples obtained
from seven unique regions of the same tissue section, including normal
melanocytes, melanoma in situ and primary melanoma from the
radial and vertical growth phases (Fig. 5a–d). We found high quantitative
reproducibility among biological replicates, resulting in disease
state and region-specific proteomes (Fig. 5e–g). Pre-cancerous
(melanoma in situ) and primary melanoma showed differences in
proteins involved in immune cell signaling and cell metabolism and
coincided with reduced melanogenesis (Supplementary Table 7 and
Extended Data Fig. 5d). The advanced stages (radial and vertical
melanoma growth phase) showed well-defined activation of metabolic
activation along with disease progression, a known hallmark
of human cancers31
. Expression of proteins involved in oxidative
phosphorylation and mitochondria function gradually increased
from melanocytes, melanoma in situ to invasive melanoma, indicating
a dependency on mitochondrial respiration in the advanced
tumor stages (Fig. 5h–j, Extended Data Fig. 5c and Supplementary
Tables 7–9). Conversely, proteins involved in antigen presentation
and interferon response were downregulated when compared to
melanoma in situ (Fig. 5h–j and Supplementary Tables 7–9), in line
with immune evasion strategies in melanoma32
.
Melanoma progression is a stepwise process involving radial and
vertical growth phases. The direct comparison of these spatially
defined regions of the same phenotype (class 4 cells) further highlighted
critical features of cancer metastasis, such as extracellular
matrix (ECM) remodeling (for example, collagen degradation) and
upregulated PDGF signaling33
(Fig. 5k,l, Extended Data Fig. 5e and
Supplementary Table 10). These tumor-driven changes support
growth, increase migration of tumor cells and remodel the ECM
to facilitate metastasis to distant organs via adjacent blood vessels33
.
DVP also discovered a significant upregulation of mRNA splicing
in the vertical compared to the radial growth phase. Pro-oncogenic
alternative splicing has recently become a therapeutic strategy in
oncology34
, and these tumors often present immunogenic neoanti-
gens35
. The increase in splicing coincided with a significant downregulation
of immune-related signaling (interferon signaling and
antigen presentation) (Fig. 5l and Supplementary Table 10), suggesting
the transition from an immunogenic ‘hot’ to a ‘cold’ tumor
zone in the vertical growth phase within the same tumor section.
Clearly, DVP spatially resolved tumor heterogeneity by localizing
tumor-related mRNA splicing, immune responses and ECM
remodeling pathways in different regions.
Discussion
DVP combines imaging technologies with unbiased proteomics to
quantify the number of expressed proteins in a given cell, map tissue
or cell-type-specific proteomes or to identify targets for future
drugs and diagnostics. We showed how our analyses describe a rich
‘microcosm in a slide’, uncovering key pathways dysregulated in
cancer progression and effectively extending ‘digital pathology’ by a
molecular dimension. It is broadly applicable to any biological system
that can be microscopically imaged, from cell culture to pathology.
As a single slide can encompass hundreds of thousands of cells,
DVP can discover and characterize rare cell states and interactions.
In contrast to single-cell transcriptomics, DVP can readily analyze
the ECMʼs subcellular structures and spatial dynamics. With further
improvements in proteomics technology, DVP should also be
suited to study proteoforms and post-translational modifications at
a single-cell-type level.
Online content
Any methods, additional references, Nature Research reporting
summaries, source data, extended data, supplementary information,
acknowledgements, peer review information; details of
author contributions and competing interests; and statements of
data and code availability are available at https://doi.org/10.1038/
s41587-022-01302-5.
Received: 8 March 2022; Accepted: 30 March 2022;
Published online: 19 May 2022
References
	1.	 Hériché, J.-K., Alexander, S. & Ellenberg, J. Integrating imaging and omics:
computational methods and challenges. Annu. Rev. Biomed. Data Sci. 2,
175–197 (2019).
Fig. 5 | DVP applied to archived primary melanoma tissue. a, DVP sample isolation workflow to profile primary melanoma. b, DVP applied to primary
melanoma immunohistochemically stained for the melanocyte marker SOX10 and the melanoma marker CD146. Left panel: stained melanoma tissue on
a PEN glass membrane slide. Right panel: pathology-guided annotation of different tissue regions. Scale bar, 1 mm. c, Pathologist-guided and ML-based
cell classification based on CD146 and SOX10 staining intensity and spatial localization: normal melanocytes, stromal cells, melanoma in situ, CD146-low
melanoma, CD146-high melanoma, radial growth melanoma and vertical growth melanoma. Right lower panel: frequency of classes predicted by
unsupervised ML (k-means clustering). d, Example pictures of the seven identified classes. Magnification factor = ×4,400. e, Correlation matrix (Pearson
r) of all 27 measured proteome samples. f, PCA of proteomes. g, PCA of all melanoma-specific proteomes from in situ to invasive (vertical growth)
melanoma. h, Unsupervised hierarchical clustering based on all 1,910 ANOVA significant (FDR < 0.05) protein groups. Two clusters of upregulated (cluster
A) or downregulated (cluster B) proteins in invasive melanoma are highlighted. i, Tissue heat map mapping the proteomics results onto the imaging data.
Relative pathway levels of selected terms from the two clusters are highlighted in i. Median protein levels were calculated per annotation and plotted for
each isolated cell class against their x and y coordinates, as defined by their segmented cellular contours. j, Box plots of z-scored protein levels for the
differentially regulated pathways visualized in i above. The box plots define the range of the data (whiskers), 25th and 75th percentiles (box) and medians
(solid line). Outliers are plotted as individual dots outside the whiskers. k, Comparing proteomic changes in CD146-high melanoma cells (class 4) of the
vertical growth (region 2) with the radial growth (region 1). Blood vessels in proximity to melanoma cells of the vertical growth are highlighted in red. Scale
bar, 1 mm. l, Gene set enrichment analysis plot of significantly enriched pathways for melanoma cells of the vertical and radial growth phase. Pathway
enrichment analysis was based on the protein fold change between vertical and radial melanoma cells and performed with the ClusterProfiler R package36
.
Enriched terms with an FDR < 0.05 are shown. MHC, major histocompatibility complex.
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology 1239
Articles NATuREBIoTEcHnology
	2.	 Brunner, A. et al. Ultra‐high sensitivity mass spectrometry quantifies
single‐cell proteome changes upon perturbation. Mol. Syst. Biol. 18,
e10798 (2022).
	3.	 Hollandi, R. et al. nucleAIzer: a parameter-free deep learning framework
for nucleus segmentation using image style transfer. Cell Syst. 10,
453–458 (2020).
	4.	 Smith, K. & Horvath, P. Active learning strategies for phenotypic profiling of
high-content screens. J. Biomol. Screen. 19, 685–695 (2014).
	5.	 Isola, P., Zhu, J.-Y., Zhou, T. & Efros, A. A. Image-to-image translation with
conditional adversarial networks. Preprint at https://arxiv.org/abs/1611.07004
(2016).
	6.	 Caicedo, J. et al. Nucleus segmentation across imaging experiments: the 2018
Data Science Bowl. Nat. Methods 16, 1247–1253 (2019).
	7.	 Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist
algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2020).
	8.	 Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and
quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
	9.	 Berg, S. et al. ilastik: interactive machine learning for (bio)image analysis.
Nat. Methods 16, 1226–1232 (2019).
	10.	Conrad, C. et al. Micropilot: automation of fluorescence microscopy-based
imaging for systems biology. Nat. Methods 8, 246–249 (2011).
	11.	Zhao, T. et al. Spatial genomics enables multi-modal study of clonal
heterogeneity in tissues. Nature 601, 85–91 (2022).
	12.	Lengyel, E. Ovarian cancer development and metastasis. Am. J. Pathol. 177,
1053–1064 (2010).
	13.	Kurnit, K. C., Fleming, G. F. & Lengyel, E. Updates and new options
in advanced epithelial ovarian cancer treatment. Obstet. Gynecol. 137,
108–121 (2021).
	14.	Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular
cell-cycle progression. Cell 132, 487–498 (2008).
	15.	Altelaar, A. M. & Heck, A. J. Trends in ultrasensitive proteomics. Curr. Opin.
Chem. Biol. 16, 206–213 (2012).
	16.	Coscia, F. et al. A streamlined mass spectrometry-based proteomics workflow
for large‐scale FFPE tissue analysis. J. Pathol. 251, 100–112 (2020).
	17.	Meier, F. et al. diaPASEF: parallel accumulation–serial fragmentation
combined with data-independent acquisition. Nat. Methods 17,
1229–1236 (2020).
	18.	Lundberg, E. & Borner, G. H. H. Spatial proteomics: a powerful discovery
tool for cell biology. Nat. Rev. Mol. Cell Biol. 20, 285–302 (2019).
	19.	Mahdessian, D. et al. Spatiotemporal dissection of the cell cycle with
single-cell proteogenomics. Nature 590, 649–654 (2021).
	20.	Uhlen, M. et al. Tissue-based map of the human proteome. Science 347,
1260419–1260419 (2015).
	21.	Venturini, V. et al. The nucleus measures shape changes for cellular
proprioception to control dynamic cell behavior. Science 370, eaba2644 (2020).
	22.	Arias-Garcia, M., Rickman, R., Sero, J., Yuan, Y. & Bakal, C. The cell–cell
adhesion protein JAM3 determines nuclear deformability by regulating
microtubule organization. Preprint at https://www.biorxiv.org/content/
10.1101/689737v2.full (2020).
	23.	Kokkat, T. J., Patel, M. S., McGarvey, D., Livolsi, V. A. & Baloch, Z. W.
Archived formalin-fixed paraffin-embedded (FFPE) blocks: a valuable
underexploited resource for extraction of DNA, RNA, and protein. Biopreserv.
Biobank 11, 101–106 (2013).
	24.	Niazi, M. K. K., Parwani, A. V. & Gurcan, M. N. Digital pathology and
artificial intelligence. Lancet Oncol. 20, e253–e261 (2019).
	25.	Zhu, S., Schuerch, C. & Hunt, J. Review and updates of
immunohistochemistry in selected salivary gland and head and neck tumors.
Arch. Pathol. Lab. Med. 139, 55–66 (2015).
	26.	Kim, L. C., Song, L. & Haura, E. B. Src kinases as therapeutic targets for
cancer. Nat. Rev. Clin. Oncol. 6, 587–595 (2009).
	27.	Shain, A. H. et al. The genetic evolution of melanoma from precursor lesions.
N. Engl. J. Med. 373, 1926–1936 (2015).
	28.	Pollock, P. M. et al. High frequency of BRAF mutations in nevi. Nat. Genet.
33, 19–20 (2003).
	29.	Raamsdonk, C. D. V. et al. Frequent somatic mutations of GNAQ in uveal
melanoma and blue naevi. Nature 457, 599–602 (2009).
	30.	Wang, Z. et al. CD146, from a melanoma cell adhesion molecule to a
signaling receptor. Signal Transduct. Target Ther. 5, 148 (2020).
	31.	Kumar, P. R., Moore, J. A., Bowles, K. M., Rushworth, S. A. & Moncrieff, M.
D. Mitochondrial oxidative phosphorylation in cutaneous melanoma. Br. J.
Cancer 124, 115–123 (2021).
	32.	Eddy, K. & Chen, S. Overcoming immune evasion in melanoma. Int. J. Mol.
Sci. 21, 8984 (2020).
	33.	Winkler, J., Abisoye-Ogunniyan, A., Metcalf, K. J. & Werb, Z. Concepts of
extracellular matrix remodelling in tumour progression and metastasis. Nat.
Commun. 11, 5120 (2020).
	34.	Zhang, Y., Qian, J., Gu, C. & Yang, Y. Alternative splicing and cancer: a
systematic review. Signal Transduct. Target Ther. 6, 78 (2021).
	35.	Frankiw, L., Baltimore, D. & Li, G. Alternative mRNA splicing in cancer
immunotherapy. Nat. Rev. Immunol. 19, 675–687 (2019).
	36.	Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package
for comparing biological themes among gene clusters. OMICS 16,
284–287 (2012).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The images or other
third party material in this article are included in the article’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
org/licenses/by/4.0/.
© The Author(s) 2022
Nature Biotechnology | VOL 40 | August 2022 | 1231–1240 | www.nature.com/naturebiotechnology1240
ArticlesNATuREBIoTEcHnology
Methods
Patient samples and ethics. We collected archival FFPE tissue samples of salivary
gland acinic cell carcinoma and melanoma from the Department of Pathology,
Zealand University Hospital, in Roskilde, Denmark. Melanoma tissue was from
a 51-year-old male and located at the left upper chest. TNM stage at diagnosis
was T3aN1M0. The histological subtype was superficial spreading melanoma;
the Clark level was 4; and the Breslow thickness was 2.27 mm. Tumor immune
infiltration was categorized as non-brisk. The FFPE sample was 17 years old. The
patient experienced recurrence at different locations 17 months after diagnosis
and died after 71 months. The acinic cell carcinoma was removed from the
right parotid gland of a 29-year-old male. There was no sign of mitosis, necrosis
de-differentiation or perineural or intravascular growth. The tumor cells were
positive in EpCAM, CK7, DOG1 and SOX10. Mammaglobin was negative.
The sample was 4 years old, and the patient is currently disease-free. The study
was carried out in accordance with institutional guidelines under approval by
the local Medical Ethics Review Committee (SJ-742) and the Data Protection
Agency (REG-066-2019) and in agreement with Danish law (Medical Research
Involving Human Subjects Act). The fallopian tube tissue shown in Fig. 2 is from a
64-year-old female and was macroscopically and histologically normal appearing.
All patients consented before surgery. Patient-derived tissues were obtained
fresh or paraffin-embedded according to an approved institutional review board
protocol (13372B) at the University of Chicago hospital. In accordance with the
Medical Ethics Review Committee approval, all FFPE human patient tissue samples
were exempted from consent, as these studies used existing archived pathological
specimens. Human tissue specimens were assessed by a board-certified pathologist.
Cell lines. The human osteosarcoma cell line U2OS was grown in DMEM (high
glucose, GlutaMAX) containing 10% FBS and penicillin–streptomycin (Thermo
Fisher Scientific).
The U2OS FUCCI cells were kindly provided by Atsushi Miyawaki14
. These
cells are endogenously tagged with two fluorescent proteins fused to the cell
cycle regulators CDT1 (mKO2-hCdt1+
) and geminin (mAG-hGem+
). CDT1
accumulates during the G1 phase, whereas geminin accumulates in the S and G2
phases, allowing cell cycle monitoring. The cells were cultivated at 37 °C in a 5.0%
CO2 humidified environment in McCoy’s 5A (modified) medium GlutaMAX
supplement (Thermo Fisher Scientific, 36600021) supplemented with 10% FBS
(VWR) without antibiotics.
U2OS cells stably expressing a membrane-targeted form of eGFP were
generated by transfection with plasmid Lck-GFP (Addgene, 61099 (ref. 37
)) and
culturing in selection medium (DMEM medium containing 10% FBS, penicillin–
streptomycin and 400 μg ml−1
of Geneticin) under conditions of limited dilution to
yield single colonies. A clonal cell line with homogenous and moderate expression
levels of Lck-eGFP at the plasma membrane was established from a single colony.
All cell lines were tested for mycoplasma (MycoAlert, Lonza) and authenticated
by STR profiling (IdentiCell).
IHC staining on membrane slides. Membrane PEN slides 1.0 (Zeiss, 415190-
9041-000) were treated with UV light for 1 hour and coated with APES
(3-aminopropyltriethoxysilane) using VECTABOND reagent (Vector Labs,
SP-1800-7) according to the manufacturer’s protocol. FFPE tissue sections were
cut (2.5 µm), air dried at 37 °C overnight and heated at 60 °C for 20 minutes to
facilitate better tissue adhesion. Next, sections were deparaffinized, rehydratrated
and loaded wet on the fully automated instrument Omnis (Dako). Antigen
retrieval was conducted using Target Retrieval Solution pH 9 (Dako, S2367)
diluted 1:10 and heated for 60 minutes at 90 °C. Single stain for EpCAM (Nordic
BioSite, clone BS14, BSH-7402-1, dilution 1:400) and sequential double stain for
SOX10/CD146 (SOX10, Nordic BioSite, clone BS7, BSH-7959-1, dilution 1:200;
CD146, Cell Marque, clone EP54, AC-0052, dilution 1:400) was performed,
and slides were incubated for 30 minutes (32 °C). After washing and blocking of
endogenous peroxidase activity, the reactions were detected and visualized using
EnVision FLEX, High pH kit (Dako, GV800 and GV809/GV821) according to the
manufacturer’s instructions. In the double stain, EnVision DAB (Dako, GV825)
and EnVision Magenta (Dako, GV900) substrate chromogen systems were used
for visualization of CD146 and SOX10, respectively. Finally, slides were rinsed in
water, counterstained with Mayerʼs hematoxylin and air dried without mounting.
IHC staining for validation of DVP studies. FFPE tissue sections were cut
(2.5 µm), placed on coated slides (Agilent/Dako, K8020) and air dried vertically
before heating at 60 °C for 20 minutes to facilitate tissue adhesion. Next, slides were
loaded on the fully automated instrument Omnis. Sections were dewaxed, and
antigen retrieval was conducted using Target Retrieval Solution High pH (Agilent/
Dako, GV804, diluted 1:50) at 97 °C for 24 minutes. Subsequently, the sections
were incubated with the primary antibodies. We selected antibodies assessed and
approved by a board-certified consultant pathologist. Proto-oncogene tyrosine
protein kinase SRC/c-Src (Cell Signaling Technology, clone 36D10, 2109, dilution
1:3,200), fatty acid synthase/FASN (Cell Signaling Technology, clone C20G5, 3180,
dilution 1:100), calponin-1/CNN1 (Cell Marque, clone EP63, AC-0060, dilution
1:300) and cytokeratin 5/CK5 (Leica Biosystems, clone XM26, NCL-L-CK5,
dilution 1:200) for 30 minutes at 32 °C. After washing and blocking of endogenous
peroxidase activity, the reactions were detected and visualized using EnVision
FLEX, High pH kit (Agilent/Dako, GV800 and GV809/GV821) according to the
manufacturer’s instructions. Finally, slides were rinsed in water, counterstained
with Mayerʼs hematoxylin and cover-slipped.
Immunofluorescence staining. Cells were first incubated with 5-ethynyl-
2′-deoxyuridine (EdU) for 20 minutes and then fixed for 5 minutes at room
temperature with 4% paraformaldehyde (PFA) and washed three times with PBS.
Cells were then permeabilized with PBS/0.2% Triton X-100 for 2 minutes on ice
and washed three times with PBS. Cells were then stained with an EdU labeling kit
(Life Technologies) and counterstained with Hoechst 33342 for 10 minutes. Slides
were mounted with GB mount (GBI Labs, E01-18).
For validation experiments (Extended Data Fig. 3), 96-well glass-bottom
plates (Greiner SensoPlate Plus, Greiner Bio-One) were coated with 12.5 µg ml−1
of human fibronectin (Sigma-Aldrich) for 1 hour at room temperature.
Immunocytochemistry was carried out following an established protocol38
.
Then, 8,000 U2OS cells were seeded in each well and incubated in a 37 °C and
5% CO2 environment for 24 hours. Cells were washed with PBS, fixed with 40 µl
of 4% ice-cold PFA and permeabilized with 40 µl of 0.1 Triton X-100 in PBS for
3×5 minutes. Rabbit polyclonal HPA antibodies targeting the proteins of interest
were diluted in blocking buffer (PBS + 4% FBS) at 2–4 µg ml−1
along with primary
marker antibodies (see below) and incubated overnight at 4 °C. Cells were washed
with PBS for 4×10 minutes and incubated with secondary antibodies (goat
anti-rabbit Alexa Fluor 488 (A11034, Thermo Fisher Scientific), goat anti-mouse
Alexa Fluor 555 (A21424, Thermo Fisher Scientific) and goat anti-chicken Alexa
Fluor 647 (A21449, Thermo Fisher Scientific)) in blocking buffer at 1.25 µg ml−1
for
90 minutes at room temperature. Cells were counterstained in 0.05 µg ml−1
of DAPI
for 15 minutes, washed with for 4×10 minutes and mounted in PBS.
Primary antibodies used were as follows:
For C7orf50 cell cycle validation: mouse anti-ANLN at 1.25 µg ml−1
(amab90662, Atlas Antibodies)
Mouse anti CCNB1 at 1 µg ml−1
(610220, BD Biosciences)
Rabbit anti-C7orf50 at 1 µg ml−1
(HPA052281, Atlas Antibodies)
For human fallopian tube tissue, FFPE tissue sections (2.5 µm) were mounted
and pre-processed as described above. Thereafter, tissue was dewaxed by washing
2×2 minutes in 100% xylene, followed by a series of 100%, 95% and 70% ethanol
for 1 minute, respectively, and 3×1 minute in ddH2O. Antigen retrieval was
performed in a water bath employing EDTA retrieval buffer (1 mM EDTA, 0.05%
Tween 20, pH 8.0) at 95 °C for 1 hour. Subsequent to a cooling phase of 1 hour at
room temperature, blocking was conducted with 10% goat serum in TBST for
1 hour at room temperature. Primary antibodies targeting FOXJ1 (mouse, dilution
1:200, 14-9965-80, Invitrogen) and EpCAM (rabbit, dilution 1:200, 14452, Cell
Signaling Technology) were diluted in 10% goat serum and incubated overnight
at 4 °C in a humidified chamber. Tissue specimens were washed 5× in TBST
and secondary antibodies for the visualization of FOXJ1 (Alexa Fluor 647 goat
anti-mouse, dilution 1:200, A21235, Invitrogen) and EpCAM (Alexa Fluor 555
goat anti-rabbit, dilution 1:200, A21428, Invitrogen), and SYTO 10 for nuclear
visualization (10624243, Invitrogen) was applied for 1 hour at room temperature
in darkness. Samples were washed 5× in TBST, followed by 2× in TBS and
cover-slipped for high-content imaging.
High-resolution microscopy. Images of immunofluorescence-labeled cell
cultures were acquired using an AxioImager Z.2 microscope (Zeiss), equipped
with wide-field optics, a ×20, 0.8 NA dry objective and a quadruple-band filter
set for Hoechst, FITC, Cy3 and Cy5 fluorescent dyes. Wide-field acquisition
was performed using the Colibri 7 LED light source and an AxioCam 702 mono
camera with 5.86 μm per pixel. Z-stacks with 19 z-slices were acquired at 3-mm
increments to capture the optimal focus plane. Images were obtained automatically
with Zeiss ZEN 2.6 (blue edition) at non-saturating conditions (12-bit dynamic
range).
IHC images from salivary gland and melanoma tissue were obtained using
the automated slide scanner Zeiss Axio Scan.Z1 for bright-field microscopy.
Bright-field acquisition was obtained using the VIS LED light source and a CCD
Hitachi HV-F202CLS camera. PEN slides were scanned with a ×20, 0.8 NA
dry objective yielding a resolution of 0.22 mm per pixel. Z-stacks with eight
z-slices were acquired at 2-mm increments to capture the optimal focus plane.
Color images were obtained automatically with Zeiss ZEN 2.6 (blue edition) at
non-saturating conditions (12-bit dynamic range).
Wide-field fluorescence microscopy for validation of cell-cycle-dependent C7orf50
expression. Cells were imaged on a Leica Dmi8 wide-field microscope equipped
with a 0.8 NA, ×40 air objective and a Hamamatsu Flash 4.0 V3 camera using
LAS X software. The segmentation of each cell was performed using Cell Profiler
software8
using DAPI for nuclei segmentation. The mean intensity of the target
protein and the cell cycle marker protein was measured in the nucleus. The
cells were grouped into the G1 and G2 phases of the cell cycle by using the
0.2 and 0.8 quantile of ANLN or CCNB1 intensity levels in the nucleus, and
cell-cycle-dependent expression of C7orf50 was validated by comparing differences
in expression levels between G1 and G2 cells.
Nature Biotechnology | www.nature.com/naturebiotechnology
Articles NATuREBIoTEcHnology
LMD. To excise cells or nuclei, we used the Leica LMD7 system, which we adapted
for automated single-cell automation. High cutting precision was achieved using
an HC PL FLUOTAR L ×63/0.70 (tissue) or ×40/0.60 (cell cultures) CORR XT
objective. We used the Leica Laser Microdissection V 8.2.3.7603 software (adapted
for this project) for full automated excision and collection of contours. For FFPE
tissue proteome analysis, we collected 50–100 cells per sample (total area collected
× slide thickness / average mammalian cell volume of 2,000 µm3
; BNID 100434), in
agreement with estimations in spatial transcriptomics analysis39
.
Leica LMD7 cutting accuracy (Leica R&D, patent EP1276586)
For ×150 objective: 10
150 = 0.07 μm
Segmentation methods and accuracy evaluation. nucleAIzer3
models were
integrated into BIAS and customized for these experiments by retraining and
refining the nucleus and cytoplasm segmentation models. First, style transfer5
learning was performed as follows. Given a new experimental scenario such as
our melanoma or salivary gland tissue sections stained immunohistochemically,
the acquisition of which produces such an image type that no annotated training
data exist for, preventing efficient segmentation with even powerful DL methods.
With an initial segmentation or manual contouring by experts (referred to as
annotation), a small mask dataset is acquired (masks represent, for example,
nuclei), which is used to generate new (synthetic) mask images such that the spatial
distribution, density and morphological properties of the generated objects (for
example, nuclei) are similar to those measured on the annotated images. The initial
masks and their corresponding microscopy images are used to train an image style
transfer model that learns how to generate the texture of the microscopy images
on the masks, marking objects using GANs40
(generative adversarial networks):
foreground to mimic, for example, nuclei, and background for surrounding, for
example, tissue structures. Parallelly, artificial masks of either nucleus or cytoplasm
objects were created and input to the image style transfer learning network that
generated realistic-looking synthetic microscopy images with the visual appearance
of the original experiment. Hence, with this artificially created training data
(synthetic microscopy images and their corresponding, also synthetic, masks),
their applied segmentation model, Mask R-CNN, is prepared for the new image
type and can accurately segment the target compartments.
We benchmarked the accuracy of the segmentation approach on a fluorescent
Lck-U2OS cell line as well as tissue samples of melanoma, salivary gland and
fallopian tube and compared results to three additional methods, including two
DL approaches—unet4nuclei (denoted as M1 in Fig. 2a and S1)6
and Cellpose
(M3)7
—alongside a widely used, conventional adaptive threshold-based and object
splitting-based application (M2)8
. We note that M1 is not intended for cytoplasm
segmentation (see details in ref. 6
and below). Segmentation accuracy according to
the F1 metric is displayed as bar plots (Fig. 2b, Extended Data Fig. 1a, Table 1 and
Supplementary Table 1), and visual representation in a color-coded manner is
also provided.
unet4nuclei6
is optimized to segment nuclei on cell culture images; Cellpose7
is an approach intended for either nucleus or cytoplasm segmentation on various
microscopy image types; and CellProfiler8
is a conventional threshold-based and
object splitting-based software broadly used in the bioimage analysis community.
unet4nuclei, as its name suggests, is primarily intended for nucleus segmentation
and uses a U-Net-based network after pre-processing of input images and then
post-processes detected objects. Cellpose uses a vector flow representation of
instances, and its neural network (also based on U-Net) predicts and combines
horizontal and vertical flows. unet4nuclei has successfully been applied in
nucleus segmentation of cell cultures, whereas Cellpose is able to generalize
well on various image modalities even outside microscopy and can be used
to segment nuclei and cytoplasms. However, as most segmentation methods,
neither is able to adapt to a new image domain, such as a particular experiment
type (for example, IHC salivary gland tissue), without re-training on newly
created ground truth annotations. On the contrary, our segmentation algorithm
(nucleAIzer3
) is able to do so via the image style transfer approach mentioned
above. Obviously, conventional algorithms cannot adapt either; thus, they need to
be re-parameterized for each experiment. For the evaluation, an expert CellProfiler
user was asked to optimize a pipeline for each sample type to the best possible
segmentation result, and then all images per sample type were segmented with one
pipeline (corresponding to the given sample).
We evaluated our segmentation performance (and comparisons) according to
the F1 score metric calculated at the 0.7-IoU (intersection over union) threshold.
IoU, also known as Jaccard index, was calculated from the overlapping region of
the predicted (segmented) object with its corresponding ground truth (real) object
at a given threshold (see formulation below). True-positive (TP), false-positive
(FP) and false-negative (FN) objects were counted accordingly, if they had an
IoU greater than the threshold t (in our case, 0.7), to yield the F1 score at this
threshold (see formulation below). Segmentation evaluation was performed
on 10–20 randomly selected images sampled from visually distinct regions for
each sample type (U2OS cells and melanoma, salivary gland and fallopian tube
tissues) to show robustness, compared to ground truth annotations drawn by
experts using AnnotatorJ41
. We included images from all relevant regions of each
sample—for example, duct cells, acini cells, cells without any membrane staining
and lymphocytes—in the salivary gland tissue, and similarly for the other samples
as well, to ensure robustness. Outlines or contours of all visible objects (nucleus
or cytoplasm) were drawn individually and then exported to mask images in the
same format that the segmentation yielded (instance segmentation masks with
increasing gray intensities by objects). The ground truth masks were solely used
in evaluation; the aforementioned image style transfer learning was trained on
automatically fetched masks of the new experiments. Considering the mean F1
scores measured, we conclude that the applied DL-based segmentation method3
available in BIAS produced segmentations on both nucleus and cytoplasm level in
a higher quality than the compared methods (see results in Fig. 2a,b and Extended
Data Fig. 1a).
Jaccard index =
|x ∩ y|
|x ∪ y|
=
|x ∩ y|
|x| + |y| − |x ∩ y|
precision(t) =
TP(t)
TP(t) + FP(t)
recall(t) =
TP(t)
TP(t) + FN(t)
F1 score(t) = 2 ·
precision(t) · recall(t)
precision(t) + recall(t)
Our evaluation results of nucleus and cell body segmentation on melanoma,
salivary gland and fallopian tube epithelium tissues and U2OS cells is presented
in Table 1.
These results correlate with our pevious study3
that showed superior
performance of nucleAIzer on various microscopy image data modalities
(fluorescent cell culture, hematoxylin and eosin tissue and further experimental
scenarios) compared to multiple segmentation approaches, including, for example,
M2 and ilastik9
.
We also note that previous methods, such as CellProfiler or ilastik, can perform
accurate segmentation of cells; moreover, the performance of M2 on tissue nucleus
segmentation is remarkable. On the other hand, robust methods (for example,
DL-based) offer the convenience of not needing to reset most parameters when
working on images from a different sample or type.
Sample preparation for MS. Cell culture (nuclei or whole cells) and tissue samples
were collected by automated LMD into 384-well plates (Eppendorf, 0030129547).
For the collection of different U2OS nuclei classes (Fig. 3 and Extended Data
Figs. 2 and 3), we normalized nuclear size differences (resulting in different total
protein amounts) by the number of collected objects per class. On average, we
collected 267 nuclei per sample. For FFPE tissue samples of salivary gland and
melanoma (2.5-µm-thick sections cut with a microtome), an area of 80,000–
160,000 µm2
per sample was collected for an estimated number of 100–200 cells
based on the average HeLa cell volume of 2,000 μm3
(BNID 100434).
Next, 20 µl of ammonium bicarbonate (ABC) was added to each sample well,
and the plate was closed with sealing tape (Corning, CLS6569-100EA). After
vortexing for 10 seconds, plates were centrifuged for 10 minutes at 2,000g and
heated at 95 °C for 30 minutes (cell culture) or 60 minutes (tissue) in a thermal
cycler (Bio-Rad S1000 with 384-well reaction module) at a constant lid temperature
of 110 °C. Then, 5 µl of 5× digestion buffer (60% acetonitrile in 100 mM ABC) was
added, and samples were heated at 75 °C for another 30 minutes. Samples were
shortly cooled down, and 1 µl of LysC was added (pre-diluted in ultra-pure water to
4 ng µl−1
) and digested for 4 hours at 37 °C in the thermal cycler. Subsequently, 1.5 µl
of trypsin was added (pre-diluted in ultra-pure water to 4 ng µl−1
) and incubated
overnight at 37 °C in the thermal cycler. The next day, digestion was stopped by
adding trifluoroacetic acid (TFA, final concentration 1% v/v), and samples were
vacuum dried (approximately 1.5 hours at 60 °C). Then, 4 µl of MS loading buffer
(3% acetonitrile in 0.2% TFA) was added, and the plate was vortexed for 10 seconds
and centrifuged for 5 minutes at 2,000g. Samples were stored at −20 °C until liquid
chromatography–mass spectrometry (LC–MS) analysis.
High-pH reversed-phase fractionation. We used high-pH reversed-phase
fractionation to generate a deep U2OS cell precursor library for data-independent
MS analysis (below). Peptides were fractionated at pH 10 with the spider-
fractionator42
. Next, 30 μg of purified peptides was separated on a 30-cm C18
column in 100 minutes and concatenated into 12 fractions with 90-second exit
valve switches. Peptide fractions were vacuum dried and reconstituted in MS
loading buffer for LC–MS analysis.
LC–MS analysis. LC–MS analysis was performed with an EASY-nLC-1200
system (Thermo Fisher Scientific) connected to a modified trapped ion
mobility spectrometry quadrupole time-of-flight mass spectrometer with
about five-fold-higher ion current (timsTOF Pro, Bruker Daltonik) with a
nano-electrospray ion source (CaptiveSpray, Bruker Daltonik). The autosampler
was configured for sample pick-up from 384-well plates.
Nature Biotechnology | www.nature.com/naturebiotechnology
ArticlesNATuREBIoTEcHnology
Peptides were loaded on a 50-cm in-house-packed HPLC column (75-µm inner
diameter packed with 1.9-µm ReproSil-Pur C18-AQ silica beads, Dr. Maisch).
Peptides were separated using a linear gradient from 5–30% buffer B (0.1%
formic acid and 80% ACN in LC–MS-grade water) in 55 minutes, followed
by an increase to 60% for 5 minutes and a 10-minute wash in 95% buffer B at
300 nl min−1
. Buffer A consisted of 0.1% formic acid in LC–MS-grade water. The
total gradient length was 70 minutes. We used an in-house-made column oven to
keep the column temperature constant at 60 °C.
Mass spectrometric analysis was performed as described in Brunner et al.,
either in data-dependent (ddaPASEF) (Fig. 4) or data-independent (diaPASEF)
mode (Figs. 2, 3 and 5). For ddaPASEF, one MS1 survey TIMS-MS and ten PASEF
MS/MS scans were acquired per acquisition cycle. Ion accumulation and ramp
time in the dual TIMS analyzer was set to 100 ms each, and we analyzed the ion
mobility range from 1/K0 = 1.6 Vs cm−2
to 0.6 Vs cm−2
. Precursor ions for MS/MS
analysis were isolated with a 2-Th window for m/z < 700 and 3-Th for m/z > 700
in a total m/z range of 100–1.700 by synchronizing quadrupole switching events
with the precursor elution profile from the TIMS device. The collision energy
was lowered linearly as a function of increasing mobility starting from 59 eV at
1/K0 = 1.6 Vs cm−2
to 20 eV at 1/K0 = 0.6 Vs cm−2
. Singly charged precursor ions
were excluded with a polygon filter (otof control, Bruker Daltonik). Precursors for
MS/MS were picked at an intensity threshold of 1.000 arbitrary units (a.u.) and
re-sequenced until reaching a ‘target value’ of 20.000 a.u., taking into account a
dynamic exclusion of 40-second elution. For data-independent analysis, we made
use of the correlation of ion mobility with m/z and synchronized the elution of
precursors from each ion mobility scan with the quadrupole isolation window. The
collision energy was ramped linearly as a function of the ion mobility from 59 eV
at 1/K0 = 1.6 Vs cm−2
to 20 eV at 1/K0 = 0.6 Vs cm−2
. We used the ddaPASEF method
for library generation.
Data analysis of proteomic raw files. Mass spectrometric raw files acquired in
ddaPASEF mode (Fig. 4) were analyzed with MaxQuant (version 1.6.7.0)43,44
.
The UniProt database (2019 release, UP000005640_9606) was searched with
a peptide spectral match and protein-level FDR of 1%. A minimum of seven
amino acids was required, including N-terminal acetylation and methionine
oxidation as variable modifications. Due to omitted reduction and alkylation,
cysteine carbamidomethylation was removed from fixed modifications. Enzyme
specificity was set to trypsin with a maximum of two allowed missed cleavages.
First and main search mass tolerance was set to 70 p.p.m. and 20 p.p.m.,
respectively. Peptide identifications by MS/MS were transferred by matching
four-dimensional isotope patterns between the runs (MBR) with a 0.7-minute
retention time match window and a 0.05 1/K0 ion mobility window. Label-free
quantification was performed with the MaxLFQ algorithm45
and a minimum
ratio count of 1.
For diaPASEF measurements (Figs. 2, 3 and 5), raw files were analyzed with
DIA-NN46
(version 1.8). To generate a project-specific spectral library, a 24-fraction
high-pH reversed-phase fractionated precursor library was created from the same
tissue specimen and acquired in ddaPASEF mode, as described above. Raw files
were analyzed with MSFragger47
under default settings (with the exception that
cysteine carbamidomethylation was removed from fixed modifications) to generate
the library file used in DIA-NN. The library consisted of 90,056 precursors, 79,802
elution groups and 7,765 protein groups.
Bioinformatic analysis. Proteomics data analysis was performed with Perseus48
and within the R environment (https://www.r-project.org/). MaxQuant output
tables were filtered for ‘Reverse’, ‘Only identified by site modification’ and
‘Potential contaminants’ before data analysis. Data were stringently filtered
to keep proteins with only 30% or less missing values (those displayed as 0 in
MaxQuant output). Missing values were imputed based on a normal distribution
(width = 0.3; downshift = 1.8) before statistical testing. PCA was performed in
R. For multi-sample (ANOVA) or pairwise proteomic comparisons (two-sided
unpaired t-test), we applied a permutation-based FDR of 5% to correct for
multiple hypothesis testing. An s0 value49
of 0.1 was used for the pairwise
proteomic comparison in Figs. 2h and 4e. Pathway enrichment analysis was
performed in Perseus (Supplementary Tables 2, 3, 5 and 9; Fisher’s exact test with
Benjamini–Hochberg FDR of 0.05) or ClusterProfiler36
(Supplementary Tables 7
and 10), the ReactomePA package50
and the WebGestalt gene set analysis toolkit
(WebGestaltR)51
, with an FDR filter of 0.05, respectively. Minimum category size
was set to 20 and maximum size to 500.
Microscopy and proteomics data integration. To visualize combined microscopy
and MS-based proteomics results, we exported the spatial data files for each
predicted class from the BIAS software. This export generates .xml output
files with the geometry and location of cells within a class. We used Python to
extract this information and aggregated it into a data frame. We then plotted the
centroid (x–y coordinates) of each cell in a scatterplot and overlapped proteomics
data. To visualize protein functional results in spatial context, we performed a
REACTOME pathway enrichment analysis on the generated proteomics results
and used normalized enrichment scores (z-scores) as a color gradient reflecting
overrepresentation of a given pathway.
Reporting Summary. Further information on research design is available in the
Nature Research Reporting Summary linked to this article.
Data availability
The mass spectrometry proteomics data have been deposited to the
ProteomeXchange Consortium via the PRIDE partner repository52
with the
dataset identifier PXD023904. BIAS raw data, image raw data, a demo dataset and
online material of how to install BIAS and reproduce our work can be accessed
at the European Bioinformatics Institute BioStudies database53
(https://www.ebi.
ac.uk/biostudies/) with the accession number S-BSST820. We used the UniProt
database (2019 release, UP000005640_9606, https://www.uniprot.org) for all mass
spectrometric raw file searches.
Code availability
A free compiled version of BIAS with limited high-throughput capabilities is
available at the BioStudies Archive (accession number S-BSST820), containing all
features applied in the described workflows. Several major components of our work
are available in open-source repositories (Supplementary Table 11).
References
	37.	Benediktsson, A. M., Schachtele, S. J., Green, S. H. & Dailey, M. E. Ballistic
labeling and dynamic imaging of astrocytes in organotypic hippocampal slice
cultures. J. Neurosci. Methods 141, 41–53 (2005).
	38.	Stadler, C., Skogs, M., Brismar, H., Uhlén, M. & Lundberg, E. A single
fixation protocol for proteome-wide immunofluorescence localization studies.
J. Proteomics 73, 1067–1078 (2010).
	39.	Moncada, R. et al. Integrating microarray-based spatial transcriptomics and
single-cell RNA-seq reveals tissue architecture in pancreatic ductal
adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
	40.	Goodfellow, J. P.-A. I. J. & Bengio, Y. Generative adversarial networks. Proc.
International Conference on Neural Information Processing Systems 2672–2680
(2014).
	41.	Hollandi, R., Diosdi, A., Hollandi, G., Moshkov, N. & Horvath, P. AnnotatorJ:
an ImageJ plugin to ease hand annotation of cellular compartments. Mol.
Biol. Cell 31, 2179–2186 (2020).
	42.	Kulak, N. A., Geyer, P. E. & Mann, M. Loss-less nano-fractionator for high
sensitivity, high coverage proteomics*. Mol. Cell Proteomics 16, 694–705
(2017).
	43.	Prianichnikov, N. et al. MaxQuant software for ion mobility enhanced
shotgun proteomics*. Mol. Cell Proteomics 19, 1058–1069 (2020).
	44.	Cox, J. & Mann, M. MaxQuant enables high peptide identification rates,
individualized p.p.b.-range mass accuracies and proteome-wide protein
quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
	45.	Cox, J. et al. Accurate proteome-wide label-free quantification by delayed
normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol.
Cell Proteomics 13, 2513–2526 (2014).
	46.	Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M.
DIA-NN: neural networks and interference correction enable deep proteome
coverage in high throughput. Nat. Methods 17, 41–44 (2020).
	47.	Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. &
Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide
identification in mass spectrometry-based proteomics. Nat. Methods 14,
513–520 (2017).
	48.	Tyanova, S. et al. The Perseus computational platform for comprehensive
analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).
	49.	Tusher, V. G., Tibshirani, R. & Chu, G. Significance analysis of microarrays
applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98,
5116–5121 (2001).
	50.	Yu, G. & He, Q.-Y. ReactomePA: an R/Bioconductor package for reactome
pathway analysis and visualization. Mol. Biosyst. 12, 477–479 (2015).
	51.	Liao, Y., Wang, J., Jaehnig, E. J., Shi, Z., & Zhang, B. WebGestalt 2019: gene
set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 47,
W199–W205 (2019).
	52.	Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in
2019: improving support for quantification data. Nucleic Acids Res. 47,
D442–D450 (2019).
	53.	Sarkans, U. et al. The BioStudies database—one stop shop for all data
supporting a life sciences study. Nucleic Acids Res. 46, D1266–D1270 (2017).
	54.	Szklarczyk, D. et al. STRING v11: protein–protein association networks with
increased coverage, supporting functional discovery in genome-wide
experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Acknowledgements
The authors thank M. Rykær, J. Madsen (NNF CPR Mass Spectrometry Platform,
University of Copenhagen) and L. Drici (NNF CPR Proteomics Program) as well as
J. Mueller (MPIB Munich) for technical assistance. We acknowledge F. Hoffmann,
C. Greb and F. Schlaudraff from Leica for technical support; T. Danka and M. Kovács
Nature Biotechnology | www.nature.com/naturebiotechnology
Articles NATuREBIoTEcHnology
for fruitful scientific discussions; and T. Hartig Braunstein, P. Hernandez-Varas and C.
Prats from the Core Facility of Integrated Microscopy for microscopy support. We thank
J. Lukas for scientific support and guidance and J. Percival for the scientific illustrations
(Illustration Ltd.). This work was supported by grants from the Novo Nordisk
Foundation (grant agreements NNF14CC0001 and NNF15CC0001) and the Max
Planck Society for the Advancement of Science and by the Chan Zuckerberg Initiative
for partial funding of the cell cycle work (grant CZF2019-002448) to E. Lundberg,
M.M. and P.H.. F.C. acknowledges the European Union’s Horizon 2020 Research and
Innovation Programme under grant agreement 846795 (Marie Skłodowska-Curie grant)
and the German Ministry of Education and Research (BMBF), as part of the National
Research Node ‘Mass Spectrometry in Systems Medicine’ (MSCoreSys), under grant
agreement 161L0222. B.D.A. acknowledges support from the Lundbeck Foundation
(R252-2017-1414) and the Novo Nordisk Foundation (NNF20OC0065720). P.H., R.H.,
F.K., E.M. and A.K. acknowledge support from the LENDULET-BIOMAG Grant (2018-
342), European Regional Development Funds (GINOP-2.2.1-15-2017-00072), H2020
(ERAPERMED-COMPASS, ERAPERMED-SYMMETRY, DiscovAIR, FAIR-CHARM),
OTKA-SNN, TKP2021-EGA09 and ELKH-Excellence grants. E. Lengyel is supported
by NIH R35CA264619 and the Chan Zuckerberg Initiative (CZIF2019-002435). We
acknowledge S. Ito and H. Masai (Tokyo Metropolitan Institute of Medical Science) for
providing the stable U2OS FUCCI cell line. The LCK-GFP plasmid was a gift from S.
Green (Addgene, plasmid 61099).
Author contributions
Conceptualization: A.M. F.C., P.H. and M.M.; Methodology: A.M., F.C., A.D.B., M.B.,
B.D.A. and M.M.; Software: R.H., F.K., A.K. and P.H.; Investigation: A.M., F.C. and
R.H.; Formal analysis: A.M., F.C. and R.H.; Writing—original draft: A.M., F.C., P.H. and
M.M.; Writing—review and editing: all authors; Resources: all authors.; Data curation:
L.M.R.G., M.B., S.N., A.M., F.C., R.H., F.K., A.K., A.S., E.M., L.S., M.A.E., E. Lengyel
and P.H.; Visualization: A.M., F.C., A.S. and R.H.; Project administration: A.M. and P.H.;
Supervision: M.M.; Funding acquisition: F.C., P.H., E. Lundberg and M.M.
Funding
Open access funding provided by Max Planck Society.
Competing interests
P.H. is the founder and a shareholder of Single-Cell Technologies Ltd., a biodata analysis
company that owns and develops the BIAS software. The remaining authors declare no
competing interests.
Additional information
Extended data is available for this paper at https://doi.org/10.1038/s41587-022-01302-5.
Supplementary information The online version contains supplementary material
available at https://doi.org/10.1038/s41587-022-01302-5.
Correspondence and requests for materials should be addressed to
Andreas Mund, Peter Horvath or Matthias Mann.
Peer review information Nature Biotechnology thanks the anonymous reviewers for their
contribution to the peer review of this work.
Reprints and permissions information is available at www.nature.com/reprints.
Nature Biotechnology | www.nature.com/naturebiotechnology
ArticlesNATuREBIoTEcHnology
Extended Data Fig. 1 | See next page for caption.
Nature Biotechnology | www.nature.com/naturebiotechnology
Articles NATuREBIoTEcHnology
Extended Data Fig. 1 | Benchmarking of segmentation algorithm. a, Cell body and nuclei segmentation of melanoma, salivary gland and fallopian
tube tissue using the Biological Image Analysis Software (BIAS). We benchmarked the accuracy of our segmentation approach using the F1 metric and
compared results to three additional methods M1-M3. unet4nuclei (M1)6
, CellProfiler (M2)8
, CellPose (M3)7
, while OUR refers to nucleAIzer3
. Bars show
mean F1-scores with SEM (standard error of the mean). Visual representation of the segmentation results: green areas correspond to true positive,
blue to false positive and red to false negative. Data provided in Table 1 and Supplementary Table 1. b, BIAS allows the processing of multiple 2D and 3D
microscopy image file formats. Examples for image pre-processing, deep learning-based image segmentation, feature extraction and machine learningbased
phenotype classification. c, Left: Contour alignment in the LMD7 software before laser microdissection of fallopian tube epithelial cells. Middle:
Screenshot after laser microdissection. Right: 384-well inspection after laser microdissection in individual fallopian tube epithelial cells. d, Number of
quantified proteins per replicate of FOXJ1 positive and negative epithelial cells. Samples were acquired in data-independent mode and analyzed with the
DIA-NN software. e, Replicate correlations of proteome measurements. Correlation values show Pearson correlations. f, Pathway enrichment analysis for
proteins significantly higher in ciliated cells compared to secretory fallopian tube epithelial cells.
Nature Biotechnology | www.nature.com/naturebiotechnology
ArticlesNATuREBIoTEcHnology
Extended Data Fig. 2 | PCA and loadings of cell culture classes at sub-cellular level and number of significantly changed proteins vs. class abundance.
a, Quantitative proteomic results of whole cell and nuclei replicates, and comparison between whole cells and nuclei. b, Principal component analysis
(PCA) of whole cell (n = 3) and nuclei proteomes (n = 3). Proteins with the strongest contribution to PC1 are highlighted. c, Relative proportions of the six
nuclei classes. d, Number of differentially expressed proteins (two-sided t-test, n = 3 biological replicates) compared to unclassified nuclei (bulk). Proteins
with an FDR less than 0.05 were considered significant. e, Correlation between number of significantly regulated proteins per nuclei class vs relative class
proportion. A linear model was fitted to the data showing an inverse correlation with Pearson r = -0.96 (p-value = 0.01). f, Relative protein levels (z-score)
of known cell cycle markers across the five nuclei classes. All bar graphs represent mean of data (n = 3 biological replicates) and error bars are s.d.
ANOVA p-values are shown.
Nature Biotechnology | www.nature.com/naturebiotechnology
Articles NATuREBIoTEcHnology
Extended Data Fig. 3 | DVP discovers uncharacterized proteins with potential clinical relevance. a, Violin plots showing nuclear area in pixels of the
6 nuclei classes identified by ML. b, Nuclear area in pixels of U2OS FUCCI cells in relation to the cell cycle pseudotime14
. Color code indicates point
density. c, Nuclear area of three major cell cycle states G1, G1/S and S/G2 determined by fluorescently tagged CDT1 and GMNN intensities and Gaussian
clustering. Box plots show the results of n = 238,675 cells in total (85,551 for G1, 83,121 for G1/S and 70,003 for S/G2). d, Relative protein levels of all
identified ORF proteins in the dataset. C7orf50, C1orf112, C19orf53 and C11orf98 were differentially expressed (ANOVA p-value < 0.05) across the 5
nuclei classes (n = 3 biological replicates). e, Mean intensities of immunofluorescent stained C7orf50 and the cell cycle markers ANLN and CCNB1
in U20S cells. C7orf50 levels were quantified in nuclei with low and high ANLN and CNNB1 intensities. Box plots show the results of n = 263 cells per
condition (C7orf50-ANLN) and n = 412 per condition (C7orf50-CCNB1). f, Upper panel: Representative immunofluorescence images of C7orf50 and
DNA (DAPI) stained U2OS cells19
. Scale bar is 20 µm. Note, C7orf50 is enriched in nucleoli. Lower panel: Immunohistochemistry of a C7orf50 stained
pancreatic adenocarcinoma (https://bit.ly/2X4re05). Image credit: Human Protein Atlas. Scale bar is 40µm. g, Kaplan-Meier survival analysis of
pancreatic adenocarcinoma (https://bit.ly/3BAxewA) based on relative C7orf50 RNA levels (FPKM, number of Fragments Per Kilobase of exon per Million
reads). RNA-seq data is reported as median FPKM, generated by The Cancer Genome Atlas (https://bit.ly/3iSOG8d). Patients were divided into two
groups based on C7orf50 levels with n=41 low and n=135 high patients. A log-rank test was calculated with p = 0.0001. h, String interactome analysis for
C7orf50. A high confidence score of 0.7 was used with the five closest interactors highlighted by color54
. The box plots in c and e define the range of the
data (whiskers), 25th and 75th percentiles (box), and medians (solid line). Outliers are plotted as individual dots outside the whiskers.
Nature Biotechnology | www.nature.com/naturebiotechnology
ArticlesNATuREBIoTEcHnology
Extended Data Fig. 4 | DVP applied to archival tissue of a rare salivary gland carcinoma. a, Immunohistochemical staining of normal salivary gland
stained for the cell adhesion protein EpCAM. Supervised (random forest) ML was trained to identify acinar (green) and duct cells (turquoise). Scale bar =
20µm. b, Quantitative proteomic comparison between acinar and duct cells from tissue in A with known cell type specific markers highlighted (https://bit.
ly/3iOK8Qf). c, Relative protein levels of selected pathways that were significantly higher in acinar or duct cells. d, Unsupervised hierarchical clustering of
acinar and duct cell proteomes from two different patients together with acinar cell carcinoma cells. Note that normal acinar cells of two different tissues
clustered together. Duct cells clustered furthest away. Prior to clustering, protein levels from different sample groups (duct cell tissue #1, acinar cell tissue
#1, acinar cell tissue #2, carcinoma tissue #2) were averaged and z-scored. Bar on the left shows differentially expressed pathways from panel b with acini
and duct specific proteins in green and turquoise, respectively.
Nature Biotechnology | www.nature.com/naturebiotechnology
Articles NATuREBIoTEcHnology
Extended Data Fig. 5 | See next page for caption.
Nature Biotechnology | www.nature.com/naturebiotechnology
ArticlesNATuREBIoTEcHnology
Extended Data Fig. 5 | DVP applied to archival tissue of primary melanoma. a, Isolation of tumor adjacent SOX10 positive melanocytes from a cutaneous
melanoma tissue. Left: Contour alignment before laser microdissection. Right: Inspection after laser microdissection. b, Number of protein quantifications
per sample type with n = 4 (melanocytes), n = 5 (stroma), n = 5 (melanoma in situ) and n = 13 (melanoma) independent replicates. Bar graphs represent
mean of data and error bars are s.d. Samples were acquired in data-independent mode and analyzed with the DIA-NN software. c, Upper panel: Heatmap
from Fig. 5h shown with identified protein clusters (color bar). Unsupervised hierarchical clustering based on all 1,910 ANOVA significant (FDR < 0.05)
protein groups. Protein levels were z-scored. Lower panel: Pathway enrichment analysis of different row clusters obtained by unsupervised hierarchical
clustering. The ReactomePA package was used for enrichment analysis with an FDR cut-off of 0.05 for all enriched terms. d, Relative levels (z-score) of
proteins related to the KEGG term ‘melanogenesis’. Note, melanocytes show highest protein levels. The box plots define the range of the data (whiskers),
25th and 75th percentiles (box), and medians (solid line). Outliers are plotted as individual dots outside the whiskers. e, Pathway enrichment analysis
of proteins up or down-regulated in vertical versus radial growth melanoma cells. Enrichment results were obtained with the ClusterProfiler R package36
based on an FDR < 0.05.
Nature Biotechnology | www.nature.com/naturebiotechnology