Abstract
This paper describes an R package LeArEst
that can be used for estimating object dimensions from a noisy image.
The package is based on a simple parametric model for data that are
drawn from uniform distribution contaminated by an additive error. Our
package is able to estimate the length of the object of interest on a
given straight line that intersects it, as well as to estimate the
object area when it is elliptically shaped. The input data may be a
numerical vector or an image in JPEG format. In this paper, background
statistical models and methods for the package are summarized, and the
algorithms and key functions implemented are described. Also, examples
that demonstrate its usage are provided.
Availability: LeArEst is
available on CRAN.
Image noise may arise by the physical processes of imaging, or it can be caused by the presence of some unwanted structures (e.g., soft tissue captured in X-ray images of bones). Such problems can occur, for example, when the object is observed with a fluorescent microscope (Ruzin et al. 1999), ground penetrating radar, medical equipment (X-ray, ultrasound), etc. With the presence of additive noise, the detection of the object edge as well as determining length or area of the object becomes a non-trivial problem. The well known edge detection methods (Qiu 2005; Canny 1986) generally do not perform well.
Our approach does not use the mentioned edge detection methods, but looks at the problem in a different way. We start with a simple univariate model where the data represent independent realizations of a random variable \(X\), \(X = U + \varepsilon\). In the aforementioned equation \(U\) is supposed to be uniformly distributed over the object image and describes the object image without an error, while \(\varepsilon\) represents measurement error. It is shown that such a simple model can also be very useful in applications itself, not only related to the image analysis. For instance, in Tolić et al. (2017) the sum of uniform and normal distributions is confirmed to be the most representative distribution for modelling transmission loss data.
Different aspects of this model are developed in Benšić and Sabo (2007b), Benšić and Sabo (2007a), Benšić and Sabo (2010), Benšić and Sabo (2016), Sabo and Benšić (2009), and Schneeweiss (2004). The basic one-dimensional model is described in Section 2 together with the results that are used for statistical inference incorporated in the package. Although this model is not universal in all applications, we find it useful in some cases.
With the assumption that the observed object has a circular or elliptical shape, a two-dimensional approach has been developed, dealing with an object area estimation problem (Benšić and Sabo 2007a; Sabo and Benšić 2009). This approach utilizes many border estimations and performs parametric curve fitting on its results.
The package LeArEst (Bensic et al. 2017) uses these methods for length and area estimation of an object captured with noise. It supports numerical inputs, which is useful if a machine that records an object stores numerical data (coordinates of recorded points). However, if an object is captured in a picture file, the package includes a web interface with which one can load a picture, specify a line that intersects the object, adjust the parameters, and perform an edge detection on the drawn line. Another web interface allows the user to draw a rectangle around the object and perform area estimation of the marked object. Description of functions dealing with numerical and graphical estimations and examples of their use are given in Section 3.
The basic model we deal with in this package is an additive error model \[X=U+\varepsilon.\] Here we suppose that the random variable \(U\) is uniformly distributed on the interval \(\left[-a,a \right]\), \(a>0\), i.e., has a density \[\label{duniform} f_{U}(x;a)=\left\{\begin{array}{ll}\frac{1}{2a},& x\in [-a,a].\\ 0,&\mbox{else}\end{array}\right. (\#eq:duniform)\] and \(\varepsilon\) is an absolutely continuous random variable describing measurement error. Further, we assume that \(\varepsilon\) is independent of \(U\) with zero mean. Instead of the sample \(U_{1},U_{2},\ldots,U_{n}\) from \(U\), one can only observe the contaminated i.i.d. sample \(X_{1},X_{2},\ldots,X_{n}\) from \(X\). We are to estimate the unknown parameter \(a\).
Our model is the special case of a general additive error model \(X=Y+\varepsilon\), where \(Y\) and \(\varepsilon\) are assumed to be independent continuous random variables, but only \(X\) is observable. Estimating the unknown density \(f_{Y}\) from an i.i.d. sample \(X_{1},X_{2},\ldots,X_{n},X_{1}\sim X\), is known as the deconvolution problem. Usually, the error part \(\varepsilon\) is assumed to have a known density \(f_{\varepsilon}\). Several nonparametric methods have been developed to estimate \(f_{Y}\) (Meister 2009); the most popular and studied is the deconvolution kernel density estimator (Carroll and Hall 1988; Stefanski and Carroll 1990). The packages decon (Wang and Wang 2013) and deamer (Stirnemann et al. 2012) provide functions for estimating density \(f_{Y}\) in a nonparametric way. Two different approaches in estimating the support of a density from a contaminated sample can be seen in (Meister 2006) and (Delaigle and Gijbels 2006). One is based on the deconvolving kernel density estimator (Delaigle and Gijbels 2006) and the other on the moment estimation (Meister 2006). To our knowledge, none of them is implemented in some R package submitted to the CRAN repository.
For our purpose (e.g., estimating borders of some object from a noisy image), we find that the model with \(Y\sim\mathcal{U}[-a,a]\) is useful in some instances. Namely, in many cases we have a relatively high contrast between an object and its background as it is the case in Figure 1. It seems reasonable to assume that the data extracted from the green line in Figure 1 come from a uniform distribution, but contaminated by an additive error.
Let \(f_{\varepsilon}\) and \(F_{\varepsilon}\) be the density and distribution function of the error part \(\varepsilon\). Then the density of \(X=U+\varepsilon\) is \[\label{fX1} f_{X}(x;a)=\int_{-\infty}^{\infty}f_{U}(t)f_{\varepsilon}(x-t)dt=\frac{1}{2a}\left(F_\varepsilon(x+a)-F_\varepsilon(x-a)\right). (\#eq:fX1)\] If we suppose that the distribution of \(\varepsilon\) belongs to a scale family, with scale parameter \(\sigma\), then (@ref(eq:fX1)) may be rewritten as
\[\label{fX2} f_{X}(x;a,\sigma)=\frac{1}{2a}\left(F\left( \frac{x+a}{\sigma}\right) -F\left( \frac{x-a}{\sigma}\right) \right), (\#eq:fX2)\] with \(F(x)\) being the standard (\(\sigma=1\) and zero mean) distribution function.
Let \(\mathbf{x}=\left(x_{1},\ldots,x_{n}\right)\) denote the realization of the i.i.d. sample \(\mathbf{X}=\left(X_{1},\ldots,X_{n}\right)\). The likelihood function has the form \[\label{lh} L\left(a,\sigma;\mathbf{x} \right)=\prod_{i=1}^{n}f_{X}(x_{i};a,\sigma)=\frac{1}{(2a)^{n}}\prod_{i=1}^{n}\left(F\left(\frac{x_{i}+a}{\sigma}\right)-F\left(\frac{x_{i}-a}{\sigma}\right)\right), (\#eq:lh)\] and the log-likelihood function is given by \[\label{llh} l(a,\sigma)=-n\log(2a)+\sum_{i=1}^{n}\log \left(F\left(\frac{x_{i}+a}{\sigma}\right)-F\left(\frac{x_{i}-a}{\sigma}\right)\right). (\#eq:llh)\] If the distribution of \(\varepsilon\) is symmetric around zero, then the Fisher information is \[I(a)=\frac{-1}{a^{2}}+\frac{1}{a\sigma^{2}}\int_{0}^{\infty}\frac{\left( f\left(\frac{x+a}{\sigma}\right)+f\left(\frac{x-a}{\sigma}\right)\right)^{2} }{F\left(\frac{x+a}{\sigma}\right)-F\left(\frac{x-a}{\sigma}\right)}\,dx.\] Supposing that parameter \(\sigma\) is known or consistently estimated then, under regularity, we have \[\label{ad} \sqrt{n}\left(\hat{a}_{ML}-a_{0} \right)\stackrel{D}\longrightarrow\mathcal{N}\left(0,\frac{1}{I(a_{0})}\right), (\#eq:ad)\] where \(a_{0}\) is the true value of \(a\).
Some flexibility of our model is achieved by changing the error distribution. For now, three types of error distributions are available in the package. The normal distribution \(\varepsilon\sim\mathcal{N}\left(0,\sigma^{2}\right)\) is sometimes a natural choice. Properties of maximum likelihood (ML) and method of moments (MM) estimators with known \(\sigma^{2}\) are given in Benšić and Sabo (2007b). The model with \(Y\sim\mathcal{U}[0,a]\) is treated in Schneeweiss (2004). Benšić and Sabo (2010) considered the unknown \(\sigma^{2}\) case. The possibility of this (one-dimensional) model in two-dimensional problems is given in Benšić and Sabo (2007a) and Sabo and Benšić (2009). For the sake of robustness Laplace and scaled Student (with \(5\) degrees of freedom) distributions are also incorporated in the package as a choice of the error distribution. Estimating issues with Laplacian error can be seen in Benšić and Sabo (2016), as well as a discussion connected to robustness.
Two procedures for deriving confidence intervals for \(a\) are described in (Hamedović et al. 2017). The first one is based on the asymptotic distribution of ML estimator in (@ref(eq:ad)). For a specified \(0<\alpha<1\), an asymptotic \((1-\alpha)100\%\) confidence interval for \(a\) is1
\[\left(\hat{a}_{ML}-\frac{z_{\alpha /2}}{\sqrt{nI(\hat{a}_{ML}) } },\hat{a}_{ML}+\frac{z_{\alpha /2}}{\sqrt{nI(\hat{a}_{ML}) }} \right).\] The second method is based on the likelihood ratio statistic \[\lambda(\mathbf{X})=\frac{\sup\limits_{H_{0}}L\left(a;\mathbf{X} \right)}{\displaystyle\sup_{(0,\infty)}L\left(a;\mathbf{X} \right)}=\frac{L\left(a_{0};\mathbf{X} \right)}{L\left(\hat{a}_{ML};\mathbf{X} \right)}.\] From the asymptotic distribution of the log-likelihood ratio statistic \[-2\log \lambda(\mathbf{X}) \stackrel{D}\longrightarrow \chi_{1}^{2}\] an approximate \((1-\alpha)100\%\) confidence interval for \(a\) is \[\left\lbrace a\vert\, l(\hat{a}_{ML})-l(a)\leq 0.5\chi^{2}_{1}(1-\alpha)\right\rbrace,\] where \(\chi^{2}_{1}(1-\alpha)\) is the \(1-\alpha\) quantile of \(\chi_{1}^{2}\) distribution.
These two approaches can be used to test hypotheses regarding the parameter \(a\). For example, in the case of a two-sided hypotheses \(H_{0}:a=a_{0}\) versus \(H_{1}:a\neq a_{0}\), the critical regions of asymptotic size \(\alpha\) are \[\left\lbrace \mathbf{x}\vert\, \sqrt{nI(a_{0})}|\hat{a}_{ML}-a_{0}|\geq z_{\alpha /2} \right\rbrace, \text{ and}\]
\[\left\lbrace \mathbf{x}\vert\, -2\log \lambda(\mathbf{x})\geq \chi_{1}^{2}(1-\alpha) \right\rbrace,\] respectively. Note that both methods are asymptotically equivalent.
The package LeArEst depends on the following packages that should installed in addition to the LeArEst package: conicfit (Gama and Chernov 2015), jpeg (Urbanek 2014), and opencpu (Ooms 2014). The stable version of the package is available on the Comprehensive R Archive Network repository (CRAN; https://CRAN.R-project.org/) and can be downloaded and installed by issuing the following command at the R console:
> install.packages("LeArEst")
The package is loaded using the following command:
> library(LeArEst)
An overview of the package’s functions is given in Table 1.
| Function | Description |
|---|---|
lengthest() |
Performs length estimation from a numerical data set. |
lengthtest() |
Performs one-sided and two-sided tests for uniform |
| distribution half-length. | |
areaest() |
Performs area estimation of a numerically described object |
| in plane. | |
startweb.esttest() |
Opens default web browser and loads a web page for length |
| estimation and testing (the object of interest is shown in | |
| an image). | |
startweb.area() |
Opens default web browser and loads a web page for area |
| estimation of the object shown in an image. |
The function lengthest computes the length of an
interval which is the domain of a uniform distribution from data
contaminated by an additive error according to the model described in
the previous section. The function’s arguments and results are given in
Table 2.
| Arguments | Description |
|---|---|
x |
Vector of input data. |
error |
Error distribution. |
var |
Error variance. |
var.est |
Method of error variance estimation. |
conf.level |
Confidence level of the confidence interval. Defaults to \(0.95\). |
| Results | Description |
radius |
Estimated half-length of the uniform support. |
var.error |
Error variance, estimated or explicitly given by
argument var. |
conf.int |
Confidence interval for half-length of the uniform support. |
method |
Method used for computing a confidence interval |
| (asymptotic distribution of ML or likelihood ratio statistic). |
In order to perform length estimation, a type of the error
distribution must be chosen through the argument error with
three possibilities: laplace (Benšić
and Sabo 2016), gauss (Benšić
and Sabo 2007b, 2010), or student (scaled Student
distribution with 5 degrees of freedom).
The variance of the additive error may or may not be known. If the
variance is known, argument var should be used and the
variance should be assigned to it. In the case of unknown variance,
function lengthest implements two methods for its
estimation: Method of Moments and Maximum Likelihood.
Value MM of the argument var.est instructs the
functions to use Method of Moments, while the corresponding value
ML triggers Maximum Likelihood Method. There is the
possibility, depending on the input data, that the Method of Moments
estimate of error variance does not exist. When that is the case, the
function stops and outputs the message instructing the user to use
Maximum Likelihood estimator or to give an explicit variance. It is
important to mention that arguments var and
var.est may not be used simultaneously.
The last argument, conf.level, specifies the confidence
level of the confidence interval calculated by the function.
The results of this function are the estimated half-length of uniform distribution (i.e., of an object), estimated or explicitly given error variance, confidence interval for half-length (with regard to the given confidence level) and the statistical method for computing a confidence interval.
Usage example. Let us generate a sample of size \(1000\) from \(X=U+\varepsilon\), where \(U\sim\mathcal{U}\left[-1, 1\right]\) and \(\varepsilon\sim\mathcal{N}\left(0,(0.1)^{2}\right)\):
set.seed(12)
sample_1 <- runif(1000, -1, 1)
sample_2 <- rnorm(1000, 0, 0.1)
sample <- sample_1 + sample_2
Figure 2 shows density estimation
from the generated data obtained with the R function
density. A half-length estimation of the uniform support
for these data can be done with the following command:
lengthest(x = sample, error = "gauss", var.est = "MM", conf.level = 0.90)
The most important part of its output is:
$radius
MLE for radius (a) of uniform distr.: 0.9916513
$var.error
MM estimate for error variance: 0.01279636
$method
[1] "Asymptotic distribution of LR statistic"
$conf.int
[1] 0.9724316 1.0116479
Function lengthtest performs one-sided and two-sided
tests against hypothesized half-length of the uniform support as it is
described in Section 2. Since the
actual calculations inside this function are based on the ML approach
most input arguments are similar to those in the function
lengthest (see Table 3).
Argument null.a is a positive number representing
hypothesized half-length of the uniform support, while argument
alternative defines the usual forms of alternatives
(two.sided, greater, or
less).
| Arguments | Description |
|---|---|
x |
Vector of input data. |
error |
Error distribution. |
null.a |
Specified null value being tested. |
alternative |
The form of the alternative hypothesis. |
var |
Error variance. |
var.est |
Method of error variance estimation. |
conf.level |
Confidence level of the confidence interval. Defaults to \(0.95\). |
| Results | Description |
p.value |
p-value of the test. |
tstat |
The value of the test statistic. |
radius |
Estimated half-length of the uniform support. |
var.error |
Error variance, estimated or explicitly given by
argument var. |
conf.int |
Confidence interval for half-length. |
method |
Method used for computing a confidence interval |
| (asymptotic distribution of ML or likelihood ratio statistic). |
Function lengthtest also performs length estimation, so
all values from its output, except p.value and the
calculated value of the test statistic (tstat), are the
same as that of the function lengthest.
Usage example. Generate the data in a similar manner
as in the lengthest example:
set.seed(12)
sample_1 <- runif(1000, -1, 1)
sample_2 <- rnorm(1000, 0, 0.1)
sample <- sample_1 + sample_2
To test that the uniform support half-length equals \(1\) against that it is less than \(1\) the function lengthest can
be used in the following way:
lengthtest(x = sample, error = "gauss", alternative = "less", var.est = "MM",
null.a = 1, conf.level = 0.95)
Part of the output dealing with a testing procedure is:
$p.value
[1] 0.2418929
$tstat
[1] -0.7002265
The input for the function areaest is supposed to be a
data set of points in the plane representing independent realizations of
a two-dimensional random vector \[X=U+\varepsilon.\] It is assumed that
\(U\) has a uniform distribution on an
ellipsoid and \(\varepsilon\) is a
two-dimensional error term independent of \(U\). Arguments and results of this function
are listed in Table 4.
| Arguments | Description |
|---|---|
data |
Two-column data matrix containing the points that describe the |
| observed object. The first column represents the \(x\) coordinate of the point, | |
| while the second column represents its \(y\) coordinate. | |
nrSlices |
Number of slices applied for plain data cutting. Defaults to 10. |
error |
Error distribution. |
var |
Error variance. |
var.est |
Method of error variance estimation. |
plot |
Logical parameter that determines whether to plot data set, calculated |
edge points, and the resulting ellipse. Defaults to
FALSE. |
|
| Results | Description |
area |
Estimated area of the object. |
points |
Set of estimated object’s edge points. |
semiaxes |
Resulting ellipse’s semi-axes. |
The algorithm implemented in the function areaest is
explained in detail in (Benšić and Sabo
2007a). The main task in area estimation is to estimate edge
points of the uniform support. In order to achieve this, the original
problem is reduced to several corresponding one-dimensional problems,
which can in turn be solved by function lengthest.
Let us denote the data set with \(D =
\left\{\left(x_i, y_i \right), i = 1, \dots, n \right\}\). The
function areast transforms this data set in two different
ways: through the \(y\)-axis and
through the \(x\)-axis. The algorithm
for transformation through the \(y\)-axis is presented below, while the
transformation through the \(x\)-axis
is done analogously.
(Transformation through the \(y\)-axis (Benšić and Sabo 2007a))
Separating through the \(y\)-axis.
Choose an integer \(m<n\) and real numbers \(\eta_1 < \eta_2 < \dots < \eta_m\) such that
\(\eta_{1} \le \min \{ y_i : i=1,\dots n \}, \ \max \{ y_i : i=1,\dots n\} \le \eta_{m}\) and
\(C_k := \{(x_i,y_i) \in D : y_i \in [ \eta_k, \eta_{k+1}] \},\ k=1, \ldots, m-1\) is a nonempty set.
Centering through the \(y\)-axis.
Let us denote \[c_k := \frac{1}{\left|C_k\right| } \sum_{\left(x_i,y_i\right) \in C_k } x_i, \ d_k := \frac{1}{\left|C_k\right|} \sum_{\left(x_i,y_i\right) \in C_k } y_i,\]
\[k=1,\dots,m-1,\] for \(k=1,\dots,m-1\) define \(\overline C_k := \left\{ x_i-c_k : \left( x_i, y_i \right) \in C_k \right\}\).
Using this algorithm the data are transformed in the way that we have
sets \(\overline C_k,\ k=1,\dots,m-1\)
that represent centered tiny strips. Argument nrSlices
corresponds to \(m-1\) and specifies
the number of strips. The lengths of these strips (in \(x\)-direction) can be estimated using the
function lengthest (the parameters error,
var, and var.est are used in a
lengthest call in the way described earlier). After doing
so, the algorithm needs to be repeated through the \(x\)-axis. Finally, at this point of the
procedure, the data that is a noisy version of points from the curve is
created – it represents estimated points from the border of the
object.
The next task is to choose one of the well-known curve fitting procedures for parameter estimation. Here we are dealing with a nonlinear parameter estimation problem.
Let us suppose that we have an elliptical domain, i.e.,
\[\mathcal{D}(\mathbf{p}) = \left\{ (x,y) \in \mathbb{R}^2 : \frac{(x-p)^2}{\alpha^2} + \frac{(y-q)^2}{\beta^2} \le 1 \right\},\]
\[\mathbf{p} = \left(p, q, \alpha, \beta\right)^T.\]
On the basis of data obtained so far, the vector of unknown
parameters \(p\) needs to be estimated,
and by doing so, the optimal ellipse that fits into our points is to be
defined. For this purpose EllipseDirectFit function from
the conicfit package is used. This function
implemets the algebraic ellipse fit method by Fitzgibbon-Pilu-Fisher
(Fitzgibbon, Pilu, and Fisher 1999).
Having parameters \(p\), it is a
trivial task to calculate the area of the ellipse that approximates the
observed object.
Usage example. Two internal files are provided with
the package: ellipse_3_4_0.1_gauss.txt and
ellipse_3_4_0.1_laplace.txt. Both of them represent an
ellipsoidal object with center in point \((1,1)\), half-axes \(1.5\) and \(2\), with added measurement error. In the
first file, the error distribution is a two-dimensional normal with
independent margins and variance 0.01, while in the other it is
Laplacian (\(\lambda=0.1\)) in both
directions, again with independent margins.
In order to use one of these files, the data needs to be read into a data frame:
inputfile <- system.file("extdata", "ellipse_3_4_0.1_laplace.txt", package = "LeArEst")
inputdata <- read.table(inputfile)
Area estimation of the uniform support can be done with the command:
areaest(inputdata, error = "laplace", var.est = "ML", nrSlices = 5, plot = TRUE)
In the previous example, the parameter plot is set to
TRUE, so the function plots the given input data (black
dots), estimated border points (red dots), and the resulting ellipse
(cyan ellipse); see Figure 3.
areaestThe most important parts of the numerical output are:
$area
[1] 9.938305
$semiaxes
[1] 2.048028 1.544638
In order to apply the described methods to a picture of an object, two web interfaces have been built and embedded into the package.
As far as we know, shiny (Chang et al. 2017) provides the simplest way of building web applications using R. However, limitations of its free version discouraged us from using it, so we decided to use the opencpu package. This package provides a reliable and interoperable HTTP API for data analysis based on R. Basically, it provides an interface between functions in R package and a custom-made web page bundled with the package, using JavaScript and AJAX. Building web interfaces using opencpu is more complex than using shiny, but at the same time, it provides more flexibility in application design. It is assumed that developers are familiar with HTTP protocol, HTML, and the JavaScript language, in order to develop such web applications.
Function startweb.esttest will be described in this
section. This function takes no arguments and returns no results, its
task is to start a web interface for length estimation and hypothesis
testing (Figure 4).
To start the analysis using the web interface the picture in JPG format should be loaded (Load Picture button). Then, a line should be drawn that intersects the object of interest by clicking on two points in the picture - length estimation will be performed on that line. Finally, parameters for a data set preparation should be set.
The Levels of gray parameter determines how many levels
of grey the algorithm should take into account. It is important to
mention that, although color images can be loaded, they are internally
converted to grey-scales prior to any calculations. Since JPG format
supports \(2^{24}\) different colors,
the number of possible colors should be reduced in order to optimize
estimation speed and memory consumption.
Line thickness specifies how many picture pixels around
the drawn line are taken into account in length estimation. For
instance, if Line thickness is set to \(3\), the algorithm takes pixels which are
direct left neighbours of the line, pixels on the line itself, and the
ones which are direct right neighbours of the line. By doing so, the
matrix of (length of the line) \(\times\) Line thickness pixels
– PixelMatrix is obtained.
By doing so, we have obtained the matrix of (length of the line)
\(\times\) Line thickness
pixels – PixelMatrix.
By clicking on the Prepare data button the data set will be prepared for the inference.
The following step deals with data preparation and is a crucial step
of the algorithm. Each pixel of PixelMatrix is mapped to a new
matrix of Box size \(\times\) Box size booleans –
DotMatrix (note that Box size is a parameter).
Further, every DotMatrix is filled with uniformly distributed
dots (i.e., TRUE values) in a way that total number of dots
in each DotMatrix corresponds to the brightness of the pixel it
represents. Then, DotMatrices are tiled up with respect to the
position of corresponding pixel and, by doing so, a new matrix of
(length of the line \(\cdot\)
Box size) \(\times\)
(Line thickness \(\cdot\)
Box size) booleans is obtained – FinalDotMatrix.
The last step is to summarize rows of the FinalDotMatrix to
obtain a vector of (length of the line) \(\cdot\) Box size integers. The
vector’s histogram is shown on a web interface (Figure 4, at the bottom) and the vector itself
serves as an input to the functions lengthest or
lengthtest.
Parameters in the Estimation section of the web interface
are transferred to lengthest, as well. After the user
clicks on Estimate button, lengthest is executed,
and its results are displayed below the picture. Additionally, the
estimated uniform support is marked red on the intersecting line.
Estimated length is expressed in width of a pixel and in percentage of whole image’s width as well. As stated in the info box, it is important to use a proportional screen resolution on user’s display, so the pixels on the screen are square-shaped.
The Testing section of this web interface serves for
hypothesis testing. Procedures related to image loading, choosing an
intersecting line, and data preparation are the same as described above.
For the purpose of testing, values for H0,
unit, and alternative (greater,
less, or two-sided) need to be specified. The
part of the web interface dealing with output of hypothesis testing
procedure is shown in Figure 5.
The function startweb.area starts a web interface for
area estimation (Figure 6). Again, the first
step is to load an image. To select an object whose surface needs to be
evaluated, a rectangle should be drawn around it. It is done by clicking
on its upper-left and lower-right corners, after which a green rectangle
is drawn on the picture.
Data parameters are similar to ones in the length estimation web
interface, with the exception of number of slices.
The first step in the area estimation algorithm for this function is
to roughly isolate the object in the selected rectangle. In order to do
that, pixels from the selected rectangle are divided into two clusters
by using the kmeans function from base-R
stats package (the criterion for clustering is
pixel brightness). Further, only pixels from the ‘object cluster’ are
observed and divided into horizontal and vertical stripes, as described
earlier in Algorithm 1. The number of stripes
is dictated by the number of slices parameter. A length
estimation procedure is conducted on each stripe, obtaining two
estimated edge points of the object for each stripe (red dots in Figure
5). Two parameters in the
Estimation section of the web interface are related to the
length estimation procedure of the stripes.
Finally, an optimal ellipse that fits into edge points is found using
EllipseDirectFit function from the
conicfit package, as well as in the
areaest function described earlier. The resulting ellipse
is drawn in red in Figure 5. Its area is
printed below, this is measured in pixels and the percentage of the
whole image area.
The R package LeArEst provides routines for estimating the support of the random variable \(U,\,U\sim\mathcal{U}[-a,a],\) based on a sample from \(X=U+\varepsilon\). The random variable \(\varepsilon\) represents additive measurement error and is supposed to have either a normal, Laplace, or scaled Student distribution with 5 degrees of freedom. The package also includes functions for estimating either the borders or the area of some object from a noisy image. The package may be useful for this purpose mainly in the case of images with reasonable contrast between the object of interest and background. For greater robustness, we find it convenient to use some error distributions with heavier tails. Sometimes we have different amounts of noise in the tails, so it would be useful to include some asymmetric error distributions as well. These are some features we are going to add in the package in order to improve flexibility of our models.
This work was supported by the Croatian Science Foundation through research grant IP-2016-06-6545. We would like to thank Krunoslav Buljan from Osijek Clinical Hospital Center for providing an image of the carotid artery.
as usual, \(z_{\alpha}\) is the \(1-\alpha\) quantile of the standard normal distribution↩︎