## E

where dk is the desired response to the kth pattern input (0 or 1), and yk' is the ADALINE output for the kth pattern input, xk.

Clearly, the dependence of e2 on w is quadratic. The R matrix is square (n + 1) x (n + 1), and can be shown to be symmetric and positive semidefinite. This quadratic form determines a paraboloidal surface for e2 as a function of each w value. There will be some w = w* that will make e2 (w) a minimum. To find the w* that minimizes e2 (w), we can start at any w, and run downhill to e2 (w*), where the slope will be zero. Since e2 (w) is a closed-form expression, vector calculus can be used to find its gradient and set it equal to zero, and solve for w*. Thus,

where R = E[xk xkT], and q = E[dk xk]. Clearly, the vector calculus required to evaluate w* is tedious. Fortunately, Widrow and Hoff approached the problem of finding the optimum w from a pragmatic, heuristic engineering viewpoint. Again, using the vector gradient, they derived a simpler iterative learning law:

This is the Widrow-Hoff delta training law in which a is a positive constant smaller than 2 divided by the largest eigenvalue of the square matrix, R, 8k = (dk - yk'), and k is the input index used in training. As a further heuristic, R is not calculated, and the effective a value is estimated by trial and error. Usually, 0.01 3 a 3 10; a0 = 0.1 is generally used as a starting value. If a is too large, values for w will not converge; if a is too small, convergence will take too long. Two refinements of the Widrow-Hoff TL can also be used; the batching version and the momentum version. The interested reader should see Hecht-Nielsen (1990) for a complete description of these variant TLs. 