Measuring mass response signal by comparing single-cell mass distributions
To capture cell mass response to treatment with adequate statistical significance, we utilize the flow-through format of SMRs, enabling mass measurements of single cells23,24,25,26,27. An SMR sensor is composed of a suspended cantilever with an integrated U-shaped microfluidic channel28 (Fig. 1c). As a cell passes through the integrated channel, the cantilever’s mass is transiently altered, inducing a brief change in the resonant frequency proportional to the buoyant mass of the cell, referred to as “mass” throughout this paper (Methods). The fluidic control scheme implemented in the instrument (Supplementary Fig. 2), together with the SMR chip, enables us to consistently measure samples of 5000 cells from a 50 μl volume in 10 min.
For measuring the treatment response from a patient specimen, we first isolate the cancer cells from the sample (Fig. 2a) and incubate the aliquoted cells with drugs or drug combinations (Methods). We then flow the cell populations through the mass sensor to capture their cell mass probability distribution functions, which we will refer to as mass distributions in this paper. As an example, Fig. 2a shows three distinct mass distributions—a reference distribution of vehicle-treated cells (gray) and two distributions of drug-treated cells (blue and purple). We compare the distributions of treated cells to that of the reference cells using Earth Mover’s Distance (EMD), a measurement of statistical similarity, and quantify the difference as a “mass response” signal (Fig. 2b, Methods). We measure a larger mass response for diverging mass distributions (higher EMD value, blue versus gray) and a smaller mass response when mass distributions are similar (lower EMD value, purple versus gray). To achieve a malignancy-agnostic metric that can be used across various tumor specimen types, we normalize each single-cell mass measurement by the mean mass of the vehicle-treated cells in the sample, resulting in a unitless mass response signal that reports mass change (in percent) relative to control (Fig. 2b).
As with other population-based statistical tests, the accuracy of the mass response measurement relies on how well the sampled cell populations represent the true distribution in the tumor sample. To understand the impact of mass measurement parameters such as sensor noise and the number of cells measured, we ran simulations using the data shown in Fig. 2a (Supplementary Fig. 3a, b). Measuring at least 2500 cells to calculate a mass distribution limits the baseline noise in the mass response signal between identical samples to less than 1.5{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} and the standard deviation of the mass response to less than 1{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0}, whereas when the measurement is based on a sample of 500 cells, these parameters are 3 and 1{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0}, respectively.
Due to the inherent biological heterogeneity of patient specimens, the isolated single cells within a sample may exhibit different levels of treatment response. Therefore, signal linearity is a critical attribute for accurately translating the treatment-induced mass change to a linear mass response that can be compared across samples, drugs, and treatment doses. As a demonstration, we simulate varying magnitudes of mass responses by sampling cells at different ratios from the treated and reference distributions shown in Fig. 2a. We show that the mass response signal is a linear function of the ratio of responding cells in the sample (Supplementary Fig. 3c).
Interpreting mass response as cancer cell response to treatment
A key goal of functional testing is to enable measurements to be performed in short timescales, ideally less than 48 hours, minimizing the impact of possible phenotypic drift and viability change of the primary cancer cells being tested. To ensure accurate and reliable treatment response results, we measure vehicle-treated cells twice—both before and after the treated cell populations—to account for any phenotypic drift over the course of mass measurement, which may take a few hours when testing several conditions. For most drugs presented here, dimethyl sulfoxide (DMSO) is used for dissolution, and therefore, DMSO (0.25{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0}) treatment alone serves as the vehicle control. Mass measurements of cells treated with 0.25{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} DMSO are indistinguishable from untreated cells, suggesting a minimal effect of DMSO alone on cell mass (Supplementary Fig. 1). Figure 2c demonstrates the structure of the measurement approach. First, a population of vehicle-treated cells are measured to be used as a “reference” distribution. Then, cells that were exposed to treatment are measured. Multiple treatment “conditions” can be sequentially measured after the reference cells for testing a drug panel. Finally, a second replicate condition of vehicle-treated cells are measured as a “control”. To quantify treatment-independent changes in the vehicle-treated cells throughout the measurement duration, we calculate the mass response between the vehicle-treated control and reference distributions (CTRL in Fig. 2d). To quantify cell response to treatment, we calculate the mass response between the drug-treated cells and vehicle-treated reference cells (TEST in Fig. 2d). Comparing the TEST and CTRL signals using bootstrapping29 to confirm a signal magnitude difference larger than a “limit of decision” threshold yields a p-value for interpreting the treatment response outcome (Methods). For example, a distance measured between blue and gray dots in Fig. 2d greater than the limit of decision with a correspondingly low p-value would indicate that cells treated with the tested drug changed their mass relative to the control cells at a significant level. A high p-value rejecting the hypothesis would instead indicate no response to treatment (purple dot in Fig. 2d). In this paper, we define the three-sigma limit of decision30 as 3{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} across all measurements, which corresponds to three times the standard deviation of the distance between 500 cells repeatedly sampled from a cell population (Supplementary Fig. 3a).
To test the robustness of our approach, we simulated cellular phenotypic drift in the form of mass loss as a function of time (Supplementary Fig. 3d). We tested varying rates of mass loss-per-time for cells to identify the limits of the measurement to correctly capture the response of the treated cells. Assuming a linear rate of phenotypic drift as a function of time, we find that for correctly resolving a mass response magnitude of 5{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} (relative to control), a phenotypic drift of vehicle-treated cells should be less than 10{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0}. We have not observed phenotypic drift rates exceeding this number for any cell line or primary specimens reported here. Nonetheless, our approach enables us to identify significant phenotypic drift by monitoring the distance between the reference and control cells, as shown as the CTRL signal (Fig. 2d). If this distance is found to be higher than 10{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0}, we conclude that the test is inconclusive due to high phenotypic drift.
Image classification for identifying single cells of interest
Despite the high efficacy of commercially available cell enrichment kits (Methods), processed primary tumor specimens often contain biological debris and cellular aggregates in addition to single cells of interest. Because mass measurements alone cannot distinguish between these different particle types, this additional material can interfere with the ability to detect drug-induced changes in mass distributions. To address this challenge, we implemented brightfield imaging inline with the mass sensor (Fig. 1) with real-time optical particle detection immediately downstream to trigger image capture. Each mass measurement is paired with its corresponding brightfield image and annotated using CNN-based image classification. This image classification occurs in two stages. First, a binary CNN classifier is used to identify which single-cell events to accept and non-single-cell events to reject, such as debris and cellular aggregates. Each accepted event is classified further as either an intact or permeable single cell and each rejected event is characterized as either an aggregate or debris using two additional binary CNN classifiers (Fig. 3a).
We trained the CNN models using manually curated images of each class collected for a range of cell lines and primary tumor specimens to ensure generalizability across various specimen formats (Methods). When applied to manually curated image sets, these models achieve cross-validated precision and recall values exceeding 97{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} for each image class (Fig. 3a).
In this training set, images with small particulate matter or fibrous material are classified as debris, and images with clearly segmented clusters of cells are classified as aggregates. Intact and permeable cell discrimination is based primarily on the reduced contrast seen in cells that have presumably lost their membrane integrity. This loss of contrast observed by brightfield imaging is consistent with viability data collected in parallel with flow cytometry assessment of DAPI, a DNA-intercalating dye that is more accessible to the nuclei of non-viable cells that have lost a functioning cell membrane (Supplementary Fig. 4).
The utility of image-annotated single-cell mass measurements can be seen when comparing the mass distributions of each particle class for cell lines with different baseline mass characteristics (Fig. 3b). Human lung cancer cells (PC9) and human multiple myeloma cells (MM1S) have significantly different underlying mass distributions. Given this variation, relying on mass measurements alone to identify single cells versus debris or aggregate events based on universal gating would not be feasible across various cell types. In contrast, image annotation and classification offer a broadly applicable means of identifying particles of interest across various samples, as can be seen with the consistent mass trends across particle classes observed in these two cell lines, with cell aggregates having the largest mass followed by intact cells, permeable cells, and debris. This flexibility allows for the identification of single cells for further analysis regardless of the underlying structure of a given specimen’s mass distribution. This improved ability to characterize single-cell mass distributions is a key requirement for robustly identifying treatment responses. To quantify the benefit of linked imaging, we compared the sampling error between random subsets of cells drawn from either all mass measurements collected in a condition or only the mass measurements annotated as accepted by image classification (Fig. 3c). Across 3222 different datasets collected for a range of primary cells and cell lines, we found that image curation significantly improved this sampling variability, with an average decrease in sampling error of 16.2{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} when compared with non-curated measurements from the same condition. Interestingly, “Aggregate” events appeared to account for more sampling error than “Debris” events when considering each class individually (Supplementary Fig. 4b, c). These results demonstrate the ability of linked imaging to improve the reliability of sampling an underlying mass distribution, particularly in the context of highly heterogenous specimens where only a limited subset of the measurements are single cells.
Cellular mechanisms of mass response
When considering changes to cell mass that may occur in response to treatment, we define three potential categories related to a drug’s mechanism of action (MOA): (1) changes due to cell cycle arrest, (2) changes due to disruption of metabolic processes, and (3) changes due to failure of the cell’s structural integrity (Fig. 4a). Using a basic set of assumptions about the nature of each type of drug response mechanism, we can create simple models that demonstrate the expected changes in mass across a population of single cells. In the case of cell cycle arrest, we expect that the mass distribution gradually consolidates around the average newborn-cell mass for G0/G1 arrest or around the average cell mass prior to the division for G2/M arrest (Fig. 4b). Prior work has shown that disruption of metabolic pathways can manifest as changes to single-cell mass31. These effects can skew towards anabolic or catabolic processes, resulting in larger or smaller cells, respectively, as shown schematically in Fig. 4b. Disruption of cell structural integrity is expected to lead to the largest changes in mass across a population, as this category is consistent with apoptosis and/or necrosis of cells, and mass loss is likely driven by dramatic physical changes to cells such as the loss of membrane integrity/cytoplasm, membrane blebbing, cellular fragmentation, and other processes. Here, we assume that large quantities of mass loss will cause cells to shift from their initial vehicle-treated distribution towards a minimally overlapping secondary distribution (Fig. 4b).
Drugs with the MOAs described here are reasonably common, as are homogenous cell lines that respond to these drugs, allowing us to test these hypothetical models. To test the mass response outcome of G0/G1 arrest, we exposed the human lung cancer H1666 cell line to 10 uM trametinib, a MEK inhibitor, for 17 h, which arrests most cells in early G1 (Supplementary Fig. 5a)32. Consistent with expectation, we saw the cell mass distribution shift downward as compared to the control (Fig. 4c, d). In contrast, by treating MDA-MB-361 cells for 24 h with 10 nM docetaxel, a microtubule inhibitor which prevents cell division, we observe a significant upward shift in mass (Fig. 4c, d and Supplementary Fig. 5b). These two cell cycle arrest phenotypes are central to the activity of many drugs, both targeted inhibitors and chemotherapies, and mass response resolves these phenotypes robustly across many examples (Supplementary Fig. 6). To produce metabolic skew data, we treated cells with cycloheximide, a ribosomal inhibitor, or carfilzomib, a proteasome inhibitor, to skew metabolism towards catabolism or anabolism, respectively. In L1210 cells treated with 400 nM cycloheximide for 24 h, we observe a decrease in the average cell mass in the population, consistent with inhibition of protein biogenesis33. If we look at U266 cells treated with 50 nM of the proteasome inhibitor Carfilzomib for 6 h, we instead observe a subtle increase in the average mass of cells, consistent with excess protein accumulation prior to downstream cytotoxicity (Fig. 4c, d)34. Finally, as an example of complete structural disruption, we used L1210 cells treated with 0.5{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} Tween 20 detergent for 10 min, which permeabilizes the cell membrane and spiked in at 40{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} to an otherwise healthy cell population. Here, we observe a decrease in the primary mass peak representing live cells, and an increase in the smaller peak, which represents permeabilized cells (Fig. 4c, d). These same changes to mass distributions can be observed for cells following cell death induced by clinically relevant drugs (Supplementary Fig. 7).
The ability of mass distributions to discern these different response profiles makes it well-suited for detecting drug response in heterogeneous primary samples. The dynamic nature of these mass change mechanisms, combined with heterogeneity in the timing of cellular response and potential phenotypic drift ex vivo, means that the presentation of these mechanisms is not necessarily uniform across a population of cells. For example, even in homogenous cell lines, such as PC9 cells treated with doxorubicin, different doses of the drug at a single timepoint show how mass response signal can be manifested from both cell cycle arrest and structural disruption, either alone or simultaneously (Supplementary Fig. 8).
Mass response modulates with dose, time, and cellular fraction
In addition to compatibility with various drug mechanisms, when considering the role of mass response measurement in a clinical pipeline, it is also important to demonstrate compatibility with heterogeneous tumor cell specimens that have a limited time window of phenotypic stability ex vivo during which drug sensitivity can be assessed. It is, therefore, important to assess the effects of drug concentration and time, as well as underlying response heterogeneity on mass response readouts.
The canonical approach used to characterize drug sensitivity as a function of dose and time is the viability-based dose-response curve or IC50 curve. As such, a range of viability markers and techniques (i.e., MTT, ATP, flow cytometry) have shown potential as functional biomarkers for cancer care, but the impact of these approaches in the context of primary tumor samples has often been limited by the number of cells required and the time necessary to conduct these assays1. However, as an established standard, IC50 curves provide a useful comparator and model for understanding the variables that affect cell drug response. IC50 measurements sweep dose space to define the dose inflection point above which a majority of cells die in response to a drug. The optimal timepoint for assessing viability-based dose response is typically dictated by the drug mechanism and cell line being studied. For fast-acting drugs, a 24 h timepoint is often sufficient to accurately define cellular sensitivity; however, for slow-acting drugs (e.g., drugs functioning through cell cycle arrest), a timepoint of 72 h or longer may be required.
To evaluate how these dose concentration and timing parameters affect cell mass response, we modulated these variables independently in cell lines to understand their effect. MM1S cells treated with a range of concentrations of carfilzomib for a fixed amount of time (15 h), showed a dose-dependent mass response similar to the IC50 curves collected for the same cell line (Fig. 5a). We also noted that mass response changes over time in response to a fixed concentration of drug (Fig. 5b). For fast-acting drugs, which rapidly induce cytotoxicity, mass response magnitude demonstrates a dose dependence in line with viability loss measurements collected at similar timepoints (Fig. 5c and Supplementary Fig. 9a, b). However, as a result of being able to detect signals prior to cell death, mass response can detect the effects of slow-acting drugs well in advance of a 50{2c3a8711102f73ee058d83c6a8025dc7f37722aad075054eaafcf582b93871a0} viability loss required to define an accurate IC50 signal (Fig. 5d and Supplementary Fig. 9c, d). For example, in the case of PC9 cells treated with paclitaxel, 24-h mass measurements accurately define a dose-response inflection point revealing effective concentrations of the drug, despite minimal changes in viability observed at all concentrations tested and IC50 measurements requiring 72 h for a more accurate readout (Fig. 5d). When this comparison is made across 60 different drugs tested across 12 different cell lines, the value of this rapidly manifesting mass response signal is made clear (Fig. 5e and Supplementary Data 1). For many drugs, whether targeted kinase inhibitors or chemotherapies, a 24-h IC50 value is comparable to measured mass response with mass-based signal developing at the same or only slightly lower doses of the drug than observed by viability measurements. However, for drugs which work through slower-acting mechanisms such as cell cycle arrest, 24-h mass response measurements still define effective drug concentrations, whereas 24-h IC50 measurements provide little perspective. Instead, 72-h or longer IC50 timepoints must be taken to define cellular sensitivity to such drugs (Fig. 5e). This ability to rapidly detect drug-induced changes in cellular phenotype is particularly beneficial in the context of primary tissue measurements where longer-term drug incubations are infeasible due to phenotypic drift and viability loss ex vivo.
While homogenous cell lines provide a good context for probing the fundamental characteristics of mass response measurements, they are not good proxies for the heterogeneity seen in primary tumor specimens. For this reason, it is important to assess the impact of heterogeneity on mass response measurements. This variable can be probed explicitly by mixing drug- and vehicle-treated fractions from the same cell line, demonstrating that at a given timepoint, the mass response increases proportionally to the fraction of cells responding (Fig. 6a). A more complex model is needed to emulate the heterogeneous size distribution and drug sensitivity in a primary sample. To test fractional sensitivity with this heterogeneity in mind, we used a mixture of three cell lines, each with a unique sensitivity to an individual drug (Fig. 6b). When we observe response using 1-drug versus, 2- or 3-drug combos, we see an additive shift in mass response that is roughly the sum of responses for each drug as a monotherapy (Fig. 6b).
These results demonstrate a high degree of concordance between mass response measurements and other existing drug response assays and show that mass response can accurately characterize the effects of time, dose, and sample heterogeneity. The higher information content offered by single-cell mass response measurements and their unique ability to resolve cellular sensitivity at earlier timepoints upstream of viability loss offer clear advantages in characterizing primary tumor cells where longer-term maintenance of cells ex vivo is not practical.
Demonstrating the feasibility of mass response measurements for various specimen formats
Sample composition and collection feasibility vary significantly across different clinical specimen formats and can affect the ease of cell isolation and measurement. A functional testing pipeline must therefore be compatible with specimens collected from a variety of tumor cell compartments in order to maintain broad applicability across malignancies.
Previous work has demonstrated the feasibility of using mass response measurements to characterize drug efficacy for hematological malignancy sample formats such as blood and bone marrow21. While providing an encouraging proof-of-concept that such biophysical readouts can accurately predict patient responses to therapy, the technical complexity of processing solid tumor specimens led us to test whether our image-annotated mass measurement workflow could maintain the ability to characterize drug sensitivity while also offering the speed, robustness, and technical reproducibility required of a clinical testing pipeline (Supplementary Fig. 2b).
To first demonstrate the compatibility of this workflow with hematologic tumor specimens, we present measurements of a peripheral blood sample from a patient with plasma cell leukemia (PCL), and a bone marrow aspirate from a patient with multiple myeloma (MM) (Fig. 7a, b). In both cases, cellular mass responses were observable for a range of therapies. Tumor cells isolated from the PCL sample with a prior demonstration of the t(11;14) translocation showed a dose-dependent response to venetoclax as a monotherapy and when in combination with bortezomib and selinexor. However, these cells did not show a significant mass response to selinexor or bortezomib alone, suggesting that the response was driven primarily by venetoclax. This patient had previously been treated with venetoclax-based therapy and had a good response lasting five months, as indicated by eradication of the t(11;14) clone in subsequent diagnostic bone marrow biopsies. However, treatment was discontinued after five months due to adverse effects (cytopenias). The bone marrow aspirate sample from a relapsed/refractory multiple myeloma patient with known extramedullary involvement demonstrated a dose-dependent response to the combination of selinexor, carfilzomib, and dexamethasone, as well as the DCEP therapy combination (dexamethasone, cyclophosphamide, etoposide, and cisplatin). When dosed as monotherapies, most of the mass response observed with the combination therapy was recapitulated by dosing with the cyclophosphamide analog mafosfamide—a spontaneously hydrolyzing compound that produces the same two components as metabolized cyclophosphamide35. As with prior predictive measurements in multiple myeloma, DCEP mass response measurements were consistent with the patient’s decrease in serum monoclonal protein and response to treatment with salvage combination chemotherapy.
For patients with relapsed or metastatic solid tumor malignancies, clinical assessment often requires the collection of solid tissue samples rather than blood or bone marrow specimens. Collection of these specimens by means of surgical resections or sometimes even by core biopsies are infeasible given the size and anatomical location of metastatic lesions, and the desire to avoid unnecessary invasive procedures. Fine-needle aspiration (FNA), which utilizes lower profile needles as compared with core biopsies, offers a minimally-invasive alternative to sample collection and reduces the risk of bleeding and injury36. To ensure maximal clinical utility, we sought to determine the feasibility of performing mass response measurements using these low-input FNA specimens, which often yield only tens of thousands of single cells for downstream analysis.
We collected FNA specimens from three different anatomical locations: a lung mass in a patient with non-small cell lung cancer (NSCLC), a neck mass in a patient with melanoma, and a soft tissue lytic bone mass in a patient with breast cancer (Fig. 7c). Total cell yields were 115-, 25-, and 120-thousand tumor cells for the lung, bone, and neck masses, respectively. These specimens demonstrated a range of cell mass drug responses, with the lung mass showing a significant mass response to paclitaxel and gemcitabine, the neck mass showing a significant mass response to a combination of dabrafenib and trametinib, and the soft tissue bone mass showing no significant response to docetaxel or doxorubicin. Interestingly, the patient with NSCLC was subsequently treated with a combination of carboplatin and paclitaxel and demonstrated a marked clinical response, consistent with the mass response to paclitaxel noted for this specimen. These measurements demonstrate the feasibility of performing the end-to-end workflow with low-input tissue formats and are an indication that the pipeline is compatible with performing drug response testing within the constraints of current clinical management strategies for patients with advanced solid tumor malignancies.
In addition to disseminated metastatic lesions, many patients with advanced cancer accumulate malignant fluids in the form of pleural effusions or abdominal ascites, which cause significant discomfort and must be drained for diagnostic and therapeutic reasons to manage symptoms37. Because these malignant fluids contain tumor cells, they offer another potential specimen format for minimally-invasive drug response testing. After standard tumor cell enrichment protocols (Methods), these samples often yield a significant number of cells for measurement. For example, in a patient with advanced non-small cell lung cancer, a 150 ml sample of malignant pleural effusion yielded nearly 170 million tumor cells, more than enough to perform mass response testing for a panel of drugs (Fig. 7d). This patient had been undergoing treatment with capmatinib due to a confirmed MET exon 14 skipping mutation but had not been responding to this therapy at the time of the effusion collection. Mass response measurements collected on the tumor cells isolated from the effusion sample were consistent with this clinical outcome, revealing no significant mass response to capmatinib across doses ranging over multiple orders of magnitude. However, these cells were not generally unresponsive to all treatments, showing significant and dose-dependent mass responses of varying magnitudes to therapies including paclitaxel, docetaxel, and cisplatin (Fig. 7e and Supplementary Fig. 10a). Consistent with cell line measurements of slower-acting taxane drugs—including paclitaxel and docetaxel—the mass responses detected for these drugs were not observable with flow cytometry-based viability measurements collected for this same specimen (Supplementary Fig. 10b). These results demonstrate the feasibility of collecting mass response measurements with malignant fluid specimens and provide an example of the drug response heterogeneity that can be revealed by measuring primary tumor cells directly. Additionally, they demonstrate the potential of this new approach to complement existing genomic biomarkers, which, as in the case of this patient, do not always identify an efficacious therapy.