Stress majorization
Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of n, m-dimensional data items, a configuration X of n points in r(<<m)-dimensional space is sought that minimises the so called stress function <math>\sigma(X)</math>. Usually r is 2 or 3, i.e. the <math>r\times n</math> matrix X lists points in 2- or 3-dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function <math>\sigma</math> is a loss or cost function that measures the squared differences between ideal (<math>m</math>-dimensional) distances and actual distances in r-dimensional space. It is defined as:
- <math>\sigma(X)=\sum_{i<j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2</math>
where <math>w_{ij}\ge 0</math> is a weight for the measurement between a pair of points <math>(i,j)</math>, <math>d_{ij}(X)</math> is the euclidean distance between <math>i</math> and <math>j</math> and <math>\delta_{ij}</math> is the ideal distance between the points (their separation) in the <math>m</math>-dimensional data space. Note that <math>w_{ij}</math> can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).
A configuration <math>X</math> which minimises <math>\sigma(X)</math> gives a plot in which points that are close together correspond to points that are also close together in the original <math>m</math>-dimensional data space.
There are many ways that <math> \sigma(X)</math> could be minimised. For example, Kruskal[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimising stress was introduced by Jan de Leeuw.[2] De Leeuw's iterative majorization method at each step minimises a simple convex function which both bounds <math>\sigma</math> from above and touches the surface of <math>\sigma</math> at a point <math>Z</math>, called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by majorizing a complicated function").
The SMACOF algorithm
The stress function <math>\sigma</math> can be expanded as follows:
- <math>
\sigma(X)=\sum_{i<j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2 =\sum_{i<j}w_{ij}\delta_{ij}^2 + \sum_{i<j}w_{ij}d_{ij}^2(X)-2\sum_{i<j}w_{ij}\delta_{ij}d_{ij}(X) </math>
Note that the first term is a constant <math>C</math> and the second term is quadratic in X (i.e. for the Hessian matrix V the second term is equivalent to tr<math>X'VX</math>) and therefore relatively easily solved. The third term is bounded by:
- <math>
\sum_{i<j}w_{ij}\delta_{ij}d_{ij}(X)=\,\operatorname{tr}\, X'B(X)X \ge \,\operatorname{tr}\, X'B(Z)Z </math>
where <math>B(Z)</math> has:
- <math>b_{ij}=-\frac{w_{ij}\delta_{ij}}{d_{ij}(Z)}</math> for <math>d_{ij}(Z)\ne 0, i \ne j</math>
and <math>b_{ij}=0</math> for <math>d_{ij}(Z)=0, i\ne j</math>
and <math>b_{ii}=-\sum_{j=1,j\ne i}^n b_{ij}</math>.
Proof of this inequality is by the Cauchy-Schwartz inequality, see Borg[3] (pp. 152--153).
Thus, we have a simple quadratic function <math>\tau(X,Z)</math> that majorizes stress:
- <math>\sigma(X)=C+\,\operatorname{tr}\, X'VX - 2 \,\operatorname{tr}\, X'B(X)X\le C+\,\operatorname{tr}\, X' V X - 2 \,\operatorname{tr}\, X'B(Z)Z = \tau(X,Z)
</math>
The iterative minimization procedure is then:
- at the kth step we set <math>Z\leftarrow X^{k-1}</math>
- <math>X^k\leftarrow \min_X \tau(X,Z)</math>
- stop if <math>\sigma(X^{k-1})-\sigma(X^{k})<\epsilon</math> otherwise repeat.
This algorithm has been shown to decrease stress monotonically (see de Leeuw[2]).
Use in graph drawing
Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.[4][5] That is, one can find a reasonably aesthetically-appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the <math>\delta_{ij}</math> are usually set to the graph-theoretic distances between nodes i and j and the weights <math>w_{ij}</math> are taken to be <math>\delta_{ij}^{-\alpha}</math>. Here, <math>\alpha</math> is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for <math>\alpha=2</math>.[6]
References
- ↑ Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric
hypothesis", Psychometrika, 29: 1–27 line feed character in
|title=
at position 70 (help). - ↑ 2.0 2.1 de Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in first=J. R.; Brodeau, F.; Romie, G.; van Cutsem, B., Recent developments in statistics, pp. 133–145 More than one of
|editor1-last=
and|editor1=
specified (help) . - ↑ Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag.
- ↑ Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat., 16 (3): 435–450.
- ↑ Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, 3383, Springer-Verlag, pp. 239–250.
- ↑ Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction, 4 (3): 197–229.