Image Segmentation for CV Projects Literature Review

I am reaching out to request your assistance in composing a Literature Review based on three PDFs that I have on hand. This review is centered around image segmentation methods pertinent to computer vision projects.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Task Details:

Content Requirements:

Method Overview:I need a thorough explanation of the segmentation methods discussed in the journals. This should include a detailed discussion of the principles that underpin these methods and how they operate.Application:Please explore the potential applications of these methods within the field of image segmentation in computer vision. If possible, include examples to illustrate these applications.Advantages and Disadvantages:I would like a critical analysis of each method, highlighting their strengths and weaknesses.Comparison:Please provide a comparison of the methods, identifying situations in which one method may be more beneficial than the others. RESEARCH ARTICLE
Lung tumor segmentation methods: Impact
on the uncertainty of radiomics features for
non-small cell lung cancer
Constance A. Owens ID1,2*, Christine B. Peterson2,3, Chad Tang4, Eugene J. Koay4,
Wen Yu5, Dennis S. Mackin ID1,2, Jing Li4, Mohammad R. Salehpour ID1, David T. Fuentes6,
Laurence E. Court1,2,6, Jinzhong Yang ID1,2
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
1 Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas,
United States of America, 2 The University of Texas Graduate School of Biomedical Sciences at Houston,
Houston, Texas, United States of America, 3 Department of Biostatistics, The University of Texas MD
Anderson Cancer Center, Houston, Texas, United States of America, 4 Department of Radiation Oncology,
The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America,
5 Department of Radiation Oncology, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai,
China, 6 Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston,
Texas, United States of America
* caowens@mdanderson.org
OPEN ACCESS
Citation: Owens CA, Peterson CB, Tang C, Koay EJ,
Yu W, Mackin DS, et al. (2018) Lung tumor
segmentation methods: Impact on the uncertainty
of radiomics features for non-small cell lung
cancer. PLoS ONE 13(10): e0205003. https://doi.
org/10.1371/journal.pone.0205003
Editor: Yong Fan, University of Pennsylvania
Perelman School of Medicine, UNITED STATES
Received: May 4, 2018
Accepted: September 18, 2018
Published: October 4, 2018
Copyright: This is an open access article, free of all
copyright, and may be freely reproduced,
distributed, transmitted, modified, built upon, or
otherwise used by anyone for any lawful purpose.
The work is made available under the Creative
Commons CC0 public domain dedication.
Data Availability Statement: All relevant data are
within the paper.
Funding: The authors acknowledge financial
support from the Cancer Prevention Research
Institute of Texas (URL: http://www.cprit.state.tx.
us/) grant under award number RP110562. J.Y.
and L.C. are the authors that received this fund.
The authors acknowledge financial support from
the National Institutes of Health/National Cancer
Institute (URL: https://cancercenters.cancer.gov/)
through Cancer Center Support Grant under award
Abstract
Purpose
To evaluate the uncertainty of radiomics features from contrast-enhanced breath-hold helical CT scans of non-small cell lung cancer for both manual and semi-automatic segmentation due to intra-observer, inter-observer, and inter-software reliability.
Methods
Three radiation oncologists manually delineated lung tumors twice from 10 CT scans using
two software tools (3D-Slicer and MIM Maestro). Additionally, three observers without formal clinical training were instructed to use two semi-automatic segmentation tools, Lesion
Sizing Toolkit (LSTK) and GrowCut, to delineate the same tumor volumes. The accuracy of
the semi-automatic contours was assessed by comparison with physician manual contours
using Dice similarity coefficients and Hausdorff distances. Eighty-three radiomics features
were calculated for each delineated tumor contour. Informative features were identified
based on their dynamic range and correlation to other features. Feature reliability was then
evaluated using intra-class correlation coefficients (ICC). Feature range was used to evaluate the uncertainty of the segmentation methods.
Results
From the initial set of 83 features, 40 radiomics features were found to be informative, and
these 40 features were used in the subsequent analyses. For both intra-observer and interobserver reliability, LSTK had higher reliability than GrowCut and the two manual segmentation tools. All observers achieved consistently high ICC values when using LSTK, but the
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
1 / 22
Segmentation uncertainty for radiomics studies
number P30 CA016672. The funders had no role in
study design, data collection and analysis, decision
to publish, or preparation of the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
ICC value varied greatly for each observer when using GrowCut and the manual segmentation tools. For inter-software reliability, features were not reproducible across the software
tools for either manual or semi-automatic segmentation methods. Additionally, no feature
category was found to be more reproducible than another feature category. Feature ranges
of LSTK contours were smaller than those of manual contours for all features.
Conclusion
Radiomics features extracted from LSTK contours were highly reliable across and among
observers. With semi-automatic segmentation tools, observers without formal clinical training were comparable to physicians in evaluating tumor segmentation.
Introduction
Precision medicine aims to customize cancer treatment for an individual patient by considering combined knowledge (i.e., conventional factors such as age and sex, genetics, proteins, and
others) [1,2]. Precision medicine seeks to completely characterize the tumor to determine optimal treatment based on patient-specific characteristics. In recent years, studies have shown
that radiomics features have the potential to significantly improve our ability to stratify
patients according to likely treatment response beyond conventional prognostic factors,
thereby leading to truly personalized cancer care [3–7].
The generic workflow of radiomics studies includes four steps: (1) image acquisition, (2)
tumor delineation, (3) feature extraction, and (4) feature analysis [8,9]. The tumor delineation
can be drawn manually or generated with a semi-automatic tool. Once the tumor delineation
has been established, radiomics features are extracted from the tumor-defined region within
the image. Thousands of radiomics features can be calculated for one tumor, and each feature
characterizes the tumor in a different way. For example, roundness is a radiomics feature that
characterizes the tumor shape and can be used to predict how the tumor may spread out to
nearby locations. Lastly, features are evaluated to see whether they correlate with prognostic or
predictive factors. Features that are shown to be predictive are then used to build outcome
models that help predict how a patient will respond to a treatment. For different diseases, different radiomics features can be selected for outcome modeling to predict likely treatment
response.
Before radiomics features can be clinically useful, it is necessary to investigate and understand the uncertainties of radiomics features. One major source of uncertainty comes from the
tumor delineation. To manually delineate the tumor precisely, in general, is difficult. Tumors
often lay adjacent to other organs that share similar characteristics with the tumor, making it
difficult to distinguish the true tumor boundary. Additionally, medical images are far from
perfect, as they have limited resolution (limiting our ability to see very small objects) and can
contain artifacts (features in an image that do not represent a real aspect of the imaged object).
Physicians may interpret the tumor differently, depending on their training and experience
[10]. In addition, the different software tools that physicians use to draw the tumor contours
may also affect the results, depending on user familiarity with the tool. Because radiomics features are calculated from the delineated tumor, uncertainty in tumor delineation could propagate to the radiomics features.
Recent advances in computer-aided automatic and semi-automatic segmentation
approaches have been shown to reduce the burden in manual delineation and lessen the
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
2 / 22
Segmentation uncertainty for radiomics studies
inconsistency in tumor delineation [11,12]. To date, a small number of studies have been performed to relate this reduced uncertainty in tumor delineation to the quality and reproducibility of radiomics features [13–17].
In this current study, we examined three specific factors that can influence the uncertainty
of radiomics features for both manual and semi-automatic segmentation methods: (1) intraobserver, (2) inter-observer, and (3) inter-software. Manual contours were generated by three
independent physicians using MIM MaestroTM (MIM Software Inc., Cleveland, Ohio, USA)
and 3D-Slicer [18]. Semi-automatic contours were generated by three trained observers using
the GrowCut algorithm from 3D-Slicer [11] and the Lesion Sizing Toolkit (LSTK) [19]. While
the segmentation accuracy of LSTK has been evaluated [19,20], to our knowledge the reliability
of radiomics features extracted from LSTK-generated contours has not been studied. Additionally, we evaluated whether manual software tools and semi-automatic software tools can
be used interchangeably for generating contours for feature extraction. The purpose of this
study can be summarized into two main objectives. The first objective was to identify a reliable
segmentation tool that produces lung tumor segmentations that yield reliable and robust
radiomics features for the same observer, across multiple observers, and across multiple software tools. The second objective was to identify a group of reliable radiomics features for nonsmall cell lung cancer (NSCLC) primary tumors.
Materials and methods
Patient data and CT image acquisition
For this study, we retrospectively obtained patient data for 10 patients with histologically verified NSCLC. The Institutional Review Board (IRB) at the University of Texas MD Anderson
Cancer Centers approved the present retrospective study, and the requirement for informed
consent was waived. The lung tumors included in this study had volumes ranging from 1.15
cm3 to 10.53 cm3. For each patient, breath-hold helical computed tomography (CT) scans
were acquired with intravenous contrast. The CT scans were acquired on General Electric
Healthcare CT scanners with a peak tube voltage of 120 kVp and tube currents ranging from
320 mAs to 570 mAs. Each scan was reconstructed with a slice thickness of 2.5 mm and pixel
spacing between 0.635 mm and 0.977 mm. Fig 1 shows a coronal slice of each tumor to display
the variety of tumor presentations and locations of this patient cohort.
Manual segmentation
Manual segmentations were performed by three radiation oncologists using two different software tools: MIM MaestroTM (MIM Software Inc., Cleveland, Ohio) and 3D-Slicer (a free
open-source software platform) [18]. Each physician manually segmented each of the 10
tumors using both manual software tools, following the RTOG 1106 contouring guideline
[21,22]. This guideline recommends contouring the primary tumor volume on CT images
using a standard lung window/level for distinguishing lung borders and using a mediastinal
window/level for distinguishing borders adjacent to the mediastinum. This process was
repeated twice at two different times, yielding two sets of contours (Fig 2). The time intervals
between the two sets of contours for each physician were approximately 1 year for the first two
physicians and 1 month for the third physician. In total, 120 manual tumor contours were generated (2 software tools × 3 observers × 2 contours × 10 tumors). For both manual software
tools, tumors were contoured using a paintbrush tool (thresholding in 3D-Slicer) in a slice-byslice fashion in the transverse plane. Physicians could observe and edit the tumor in the coronal and sagittal planes as well, when desired.
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
3 / 22
Segmentation uncertainty for radiomics studies
Fig 1. Tumor presentations and locations. A central slice of each tumor in the coronal view is displayed to show the variety in tumor locations, shapes
and appearances of the patients used in this study. A single physician contour is displayed (red) to identify the tumor in each patient scan.
https://doi.org/10.1371/journal.pone.0205003.g001
Semi-automatic tumor segmentation
Semi-automatic segmentations were generated using two different software tools: LSTK (a
level-set algorithm available from an open-source toolkit) and GrowCut (a region growing
algorithm implemented in 3D-Slicer). For the semi-automatic segmentations, three observers
without formal clinical training were instructed to use the two semi-automatic tools to
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
4 / 22
Segmentation uncertainty for radiomics studies
Fig 2. Schematic of the collection of manual and semi-automatic contours. Each circle and triangle represent a single tumor contour. The time interval between
contour set 1 and contour set 2 was 1 year for the contours represented by circles and 1 month for the contours represented by triangles.
https://doi.org/10.1371/journal.pone.0205003.g002
generate tumor segmentations. Verbal step-by-step instructions were given to each observer
on using each software tool. After that, observers practiced using each software tool on three
lung tumors (outside the study). The entire process took less than 15 minutes, with instruction
lasting 5 minutes and practice lasting less than 10 minutes. Once observers felt comfortable
with the software tool, the segmentations for this study were collected. The contouring process
that was used for the manual contours was repeated for the semi-automatic contours for the
same 10 tumors (Fig 2). The time interval between the two sets was 1 to 2 months for each
observer to lessen memory effects. Other studies showed that 3 weeks between contouring
runs are enough to mitigate the effects of memory [23].
For GrowCut, observers labeled foreground and background pixels with two clicks (Fig 3)
in each view, totaling in at least six clicks per tumor case. If the tumor was attached to the chest
wall or mediastinum, additional clicks at appropriate location are needed to help the algorithm
differentiate the tumor from the chest wall or mediastinum. Once labels were established, the
GrowCut algorithm was followed by manual editing of the GrowCut-generated contours. The
editing process took up to 2 minutes for some tumor cases.
For LSTK, the only interaction was to pick a seed which is a user-selected voxel within the
tumor (Fig 3). Defining the maximum tumor radius was optional; however, defining an appropriate maximum tumor radius might save computation time in running LSTK. The LSTK algorithm
has several preset parameters that can affect the segmentation result. We used the initial physician
manual contours to guide us in selecting these parameters. Detailed discussions regarding the
algorithms of GrowCut and LSTK can be found in other publications [19,20].
Validating tumor segmentation accuracy
We validated the accuracy of each semi-automatic segmentation. A group-consensus contour
was generated as the ground truth where the group-consensus contour is taken to be the intersecting tumor volume shared by a majority of experts [23–25]. In this study, the group-consensus contour consisted of the tumor region where at least four of the initial six manual
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
5 / 22
Segmentation uncertainty for radiomics studies
Fig 3. User inputs for initializing semi-automatic segmentation tools. (A) LSTK requires the user to select a seed within the tumor (red) to
initiate the segmentation algorithm. Defining the maximum tumor radius generates a 3D bounding box (green) centered about the seed, within
which the segmentation result will be confined. (B) GrowCut requires the user to label foreground (blue) and background (yellow) pixels to initiate
the segmentation algorithm. Once labels were established, the GrowCut algorithm was followed by manual editing of the GrowCut-generated
contours. Note that only the transverse view is shown here. Observers also labeled foreground and background pixels in the coronal and sagittal
planes for each tumor case.
https://doi.org/10.1371/journal.pone.0205003.g003
physician contours overlapped. To assess the accuracy of each tumor segmentation, the Dice
similarity coefficient (DSC) and Hausdorff distance (HD) were calculated between the groupconsensus contour and each individual semi-automatic contour. The DSC quantifies the spatial overlap between two contours, while the HD quantifies the longest contour distance
between the boundaries of two contours. While the DSC can detect incorrectly labeled voxels,
the HD metric is better at detecting deviations (sharp spikes or tiny holes) that significantly
alter the contour shape but do not substantially alter the volume.
Feature extraction
Features were calculated for all 240 tumor segmentations (120 manual + 120 semi-automatic).
For this study, feature extraction was performed using the open-source Imaging Biomarker
Explorer (IBEX) software [26]. A total of 83 features were calculated. We stratified the features
into three main categories: geometric shape (SHP), intensity histogram (HIS), and texture
(TXT). Co-occurrence matrix features (a subcategory of texture features) were calculated in
four directions (0, 45, 90, and 135 degrees), and the final value was taken to be an average of
these four directions to avoid directional bias [27]. A common pre-processing step used to
refine contours before feature extraction is to remove voxels with intensity values for normal
lung tissue, bone, or air that might be inside the tumor contour. Since the purpose of this
study is to investigate the segmentation uncertainty on radiomics features, we omitted this
step to adhere to the original segmentation. We also did not correct for pixel size [28] or perform smoothing [29] to avoid introducing other uncertainties to this study.
Feature reduction
One common approach for narrowing the feature set is to apply a combination of different
methods in a sequential manner [9,14,15,30,31] to remove features that are non-informative
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
6 / 22
Segmentation uncertainty for radiomics studies
or redundant. In the current study, we applied two steps to reduce the initial feature set of 83
features to 40 informative and non-redundant features. The first step was to remove features
that did not vary across different patients. For a feature to be informative, it must exhibit a
range of values across different patients [9,14]. In other words, it must have a wide dynamic
range to differentiate patients. Because multiple contours were generated for each patient, the
average feature value was calculated for each patient. Before calculating the normalized
dynamic range (NDR) for each feature, the average values for each feature were rescaled
(across the patients) to have a mean of 0 and a standard deviation of 1 using z-score normalization, so that features with values of different scales could be compared. The NDR for each feature, NDRf, was calculated as:
NDRf ¼ maxðfc
avg Þ
minðfc
avg Þ
where maxðfc
avg Þ is the maximum normalized average feature value across all patients and
c
minðf Þ is the minimum normalized average feature value across all patients. Once the NDR
avg
is calculated for each feature, a cutoff value is chosen as a means to remove the least informative features. In general, the cutoff value is chosen arbitrarily and may be set to a higher or
lower value [9,15]. For the second step, highly correlated features were removed. It is well
known that many features are highly correlated [9]. To deal with this issue, we computed a correlation matrix to identify highly correlated features. In this step, Spearman correlation coefficients were computed to evaluate the correlation between all features.
Feature reliability analysis
In this study, we examined three specific factors that can influence feature reliability: intraobserver, inter-observer, and inter-software (Table 1). Intra-observer agreement is a reliability
measure of repeatability, while inter-observer and inter-software agreement are reliability
measures of reproducibility [32]. To assess feature reliability, intraclass correlation coefficients
(ICCs) were calculated for each feature. There are ten different forms of the ICC [33] and
selecting the appropriate form depends on the experimental setup. To assess intra-observer
Table 1. ICC formulas used to assess feature reliability.
Reliability
Factor
ICC Descriptiona
ICC Equationa, b
Intraobserver
One-way random-effects model, single
measure, absolute-agreement
MSR MSW
MSR þðkþ1ÞMSW
To determine whether features can be extracted reliably from tumor contours
generated by a single physician/observer using a single software tool at multiple
timepoints
Interobserver
Two-way mixed-effects model, single
measure, absolute-agreement
MSR MSE
MSR þðk 1ÞMSE þnk ðMSC MSE Þ
To determine whether features can be extracted reliably from tumor contours
generated by multiple physicians/observers using a single software tool
Intersoftware
Two-way mixed-effects model, single
measure, absolute-agreement
MSR MSE
MSR þðk 1ÞMSE þnk ðMSC MSE Þ
To determine whether features can be extracted reliably from tumor contours
generated by a single physician/observer using multiple software tools
Explanation of Reliability Factor Being Examined
MSR = mean square for rows; MSW = mean square for residual sources of variance; MSE = mean square error; MSC = mean square for columns; n = number of tumors;
k = number of physicians/observers.
a
The information and equations in these columns were taken from McGraw and Wong [33].
b
Each row represents a different tumor case and each column represents a different measurement (for intra-observer), different judge (for inter-observer), or different
software tool (for inter-software).
https://doi.org/10.1371/journal.pone.0205003.t001
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
7 / 22
Segmentation uncertainty for radiomics studies
reliability, we used a one-way random-effects model where the tumor cases are a random
effect. To assess inter-observer and inter-software reliability, we used a two-way mixed-effects
model where the tumor cases are a random effect and the observers (for inter-observer) and
the software tools (for inter-software) are a fixed effect. The specific ICC form used to assess
each reliability relationship is shown in Table 1. The ICC values, which can range from values
of -1 to values of 1, were stratified into four different classifications. ICC values less than 0.4,
between 0.4 and 0.6, between 0.6 and 0.75, and greater than 0.75 represented the ICC bounds
for the classifications of poor, fair, good, and excellent reliability [23].
Correlation between ICC and CCC. Concordance correlation coefficients (CCCs) were
also calculated because other feature reliability studies have used the CCC metric in their analysis [14,29,34,35]. Spearman rank correlation coefficients and pairwise scatterplots were computed between the ICC and CCC estimates for each reliability relationship.
Identifying reliable feature categories. For this part of the analysis, we wanted to determine whether a specific feature category (shape, histogram, texture) was significantly more
reproducible than another feature category. For this determination, Wilcoxon rank sum test
(aka Mann-Whitney test) values were computed between each feature category combination
(e.g., shape versus histogram) for each ICC relationship.
Feature range analysis
For segmentations from each software tool, we calculated the feature range (inter-patient variability) across observers for each radiomics feature. First, we normalized each feature using zscore normalization. This allowed us to more easily compare and plot features on different
scales. Each normalized feature, b
f , was calculated as:
i
fp;i f p
b
fi ¼
sp;f
where fp,i is the feature for contour i from patient p, f p is the mean value for feature f for all
contours from patient p, and σp,f is the standard deviation for feature f for all contours from
patient p. Then we recorded the minimum and maximum normalized feature values for each
segmentation method to assess the feature range of each segmentation method.
Results
Validating tumor segmentation accuracy
For the semi-automatic tools, the mean DSCs were 0.88 ± 0.06 and 0.88 ± 0.08 for LSTK and
GrowCut, respectively (Fig 4). For the semi-automatic tools, the mean HD values were
0.48 ± 0.17 cm and 0.43 ± 0.20 cm for LSTK and GrowCut, respectively. The DSC and HD
results show that trained observers can achieve comparable contours with these semi-automatic tools to the group-consensus physician contour, and hence these semi-automatically
generated contours can be used for feature extraction.
Feature reduction
To identify non-informative features, the NDR was calculated for each feature. A histogram
showing the number of features within a range of NDR values is shown in Fig 5. All features
had an NDR value greater than 2.4 and hence all features were considered to exhibit large
enough inter-patient variability to remain in the feature set. To evaluate the correlation
between all features, pair-wise Spearman correlation coefficients were computed (Fig 6). Pair-
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
8 / 22
Segmentation uncertainty for radiomics studies
Fig 4. Validating segmentation accuracy of semi-automatic contours. Box plot of the Dice similarity coefficients and Hausdorff
distances by software tool displays the segmentation accuracy for each software tool.
https://doi.org/10.1371/journal.pone.0205003.g004
wise correlation coefficients with an absolute value larger than 0.95 were regarded as very
redundant [15]. For correlated features, the feature with the largest mean absolute correlation
was removed, reducing the feature set to 40 non-redundant features (Fig 7).
Fig 5. Histogram distribution of the normalized dynamic range for all 83 radiomics features. The histogram
distribution shows the number of features within a range of NDR values where each bin has a width of 0.05.
https://doi.org/10.1371/journal.pone.0205003.g005
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
9 / 22
Segmentation uncertainty for radiomics studies
Fig 6. Spearman correlation coefficient heat map including all initial 83 features. Spearman correlation coefficients were computed for 83 radiomics features.
Green, white, and red denote positive, random, and negative correlations, respectively. A large number of features were highly correlated.
https://doi.org/10.1371/journal.pone.0205003.g006
Feature reliability analysis
Correlation between ICC and CCC. For all reliability relationships, the results for the
Spearman rank correlation coefficients between the CCC and ICC values showed a strong and
PLOS ONE | https://doi.org/10.1371/journal.pone.0205003 October 4, 2018
10 / 22
Segmentation uncertainty for radiomics studies
Fig 7. Spearman correlation coefficient heat map including 40 non-redundant features. Feature pairs with Spearman correlation coefficients less than 0.95.
Spearman correlation coefficients larger than 0.95 were regarded as highly redundant and were eliminated from the initial feature set, reducing the feature set to 40
non-redundant features. Green, white, and red denote positive, random, and negative correlations, respectively. Correlation coefficients marked with an x are
insignificant coefficients.
https://doi.org/10.1371/journal.pone.0205003.g007
statistically significant positive correlation (ρ>0.965, p0.982, p

Still stressed with your coursework?
Get quality coursework help from an expert!