Chapter 2

(X.)

V.Theory of Equations.

1. In the subject “Theory of Equations” the termequationis used to denote an equation of the form xn− p1xn−1... ± pn= 0, where p1, p2... pnare regarded as known, and x as a quantity to be determined; for shortness the equation is written ƒ(x) = 0.

The equation may benumerical; that is, the coefficients p1, p2n, ... pnare then numbers—understanding by number a quantity of the form α + βi (α and β having any positive or negative real values whatever, or say each of these is regarded as susceptible of continuous variation from an indefinitely large negative to an indefinitely large positive value), and i denoting √−1.

Or the equation may bealgebraical; that is, the coefficients are not then restricted to denote, or are not explicitly considered as denoting, numbers.

1. We consider first numerical equations. (Real theory, 2-6; Imaginary theory, 7-10.)

Real Theory.

2. Postponing all consideration of imaginaries, we take in the first instance the coefficients to be real, and attend only to the real roots (if any); that is, p1, p2, ... pnare real positive or negative quantities, and a root a, if it exists, is a positive or negative quantity such that an− p1an−1... ± pn= 0, or say, ƒ(a) = 0.

It is very useful to consider the curve y = ƒ(x),—or, what would come to the same, the curve Ay = ƒ(x),—but it is better to retain the first-mentioned form of equation, drawing, if need be, the ordinate y on a reduced scale. For instance, if the given equation be x³ − 6x² + 11x − 6.06 = 0,1then the curvey = x³ − 6x² + 11x − 6.06 is as shown in fig. 1, without any reduction of scale for the ordinate.

It is clear that, in general, y is a continuous one-valued function of x, finite for every finite value of x, but becoming infinite when x is infinite;i.e., assuming throughout that the coefficient of xnis +1, then when x = ∞, y = +∞; but when x = −∞, then y = +∞ or −∞, according as n is even or odd; the curve cuts any line whatever, and in particular it cuts the axis (of x) in at most n points; and the value of x, at any point of intersection with the axis, is a root of the equation ƒ(x) = 0.

If β, α are any two values of x (α > β, that is, α nearer +∞), then if ƒ(β), ƒ(α) have opposite signs, the curve cuts the axis an odd number of times, and therefore at least once, between the points x = β, x = α; but if ƒ(β), ƒ(α) have the same sign, then between these points the curve cuts the axis an even number of times, or it may be not at all. That is, ƒ(β), ƒ(α) having opposite signs, there are between the limits β, α an odd number of real roots, and therefore at least one real root; but ƒ(β), ƒ(α) having the same sign, there are between these limits an even number of real roots, or it may be there is no real root. In particular, by giving to β, α the values -∞, +∞ (or, what is the same thing, any two values sufficiently near to these values respectively) it appears that an equation of an odd order has always an odd number of real roots, and therefore at least one real root; but that an equation of an even order has an even number of real roots, or it may be no real root.

If α be such that for x = or > a (that is, x nearer to +∞) ƒ(x) is always +, and β be such that for x = or < β (that is, x nearer to −∞) ƒ(x) is always −, then the real roots (if any) lie between these limits x = β, x = α; and it is easy to find by trial such two limits including between them all the real roots (if any).

3. Suppose that the positive value δ is an inferior limit to the difference between two real roots of the equation; or rather (since the foregoing expression would imply the existence of real roots) suppose that there are not two real roots such that their difference taken positively is = or < δ; then, γ being any value whatever, there is clearly at most one real root between the limits γ and γ + δ; and by what precedes there is such real root or there is not such real root, according as ƒ(γ), ƒ(γ + δ) have opposite signs or have the same sign. And by dividing in this manner the interval β to α into intervals each of which is = or < δ, we should not only ascertain the number of the real roots (if any), but we should alsoseparatethe real roots, that is, find for each of them limits γ, γ + δ between which there lies this one, and only this one, real root.

In particular cases it is frequently possible to ascertain the number of the real roots, and to effect their separation by trial or otherwise, without much difficulty; but the foregoing was the general process as employed by Joseph Louis Lagrange even in the second edition (1808) of theTraité de la résolution des équations numériques;2the determination of the limit δ had to be effected by means of the “equation of differences” or equation of the order ½ n(n − 1), the roots of which are the squares of the differences of the roots of the given equation, and the process is a cumbrous and unsatisfactory one.

4. The great step was effected by the theorem of J.C.F. Sturm (1835)—viz. here starting from the function ƒ(x), and its first derived function ƒ′(x), we have (by a process which is a slight modification of that for obtaining the greatest common measure of these two functions) to form a series of functions

ƒ(x), ƒ′(x), ƒ2(x), ... ƒn(x)

of the degrees n, n − 1, n − 2 ... 0 respectively,—the last term ƒn(x) being thus an absolute constant. These lead to the immediate determination of the number of real roots (if any) between any two given limits β, α; viz. supposing α > β (that is, α nearer to +∞), then substituting successively these two values in the series of functions, and attending only to the signs of the resulting values, the number of the changes of sign lost in passing from β to α is the required number of real roots between the two limits. In particular, taking β, α = −∞, +∞ respectively, the signs of the several functions depend merely on the signs of the terms which contain the highest powers of x, and are seen by inspection, and the theorem thus gives at once the whole number of real roots.

And although theoretically, in order to complete by a finite number of operations the separation of the real roots, we still need to know the value of the before-mentioned limit δ; yet in any given case the separation may be effected by a limited number of repetitions of the process. The practical difficulty is when two or more roots are very near to each other. Suppose, for instance, that the theorem shows that there are two roots between 0 and 10; by giving to x the values 1, 2, 3, ... successively, it might appear that the two roots were between 5 and 6; then again that they were between 5.3 and 5.4, then between 5.34 and 5.35, and so on until we arrive at a separation; say it appears that between 5.346 and 5.347 there is one root, and between 5.348 and 5.349 the other root. But in the case in question δ would have a very small value, such as .002, and even supposing this value known, the direct application of the first-mentioned process would be still more laborious.

5. Supposing the separation once effected, the determination of the single real root which lies between the two given limits may be effected to any required degree of approximation either by the processes of W.G. Horner and Lagrange (which are in principle a carrying out of the method of Sturm’s theorem), or by the process of Sir Isaac Newton, as perfected by Joseph Fourier (which requires to be separately considered).

First as to Horner and Lagrange. We know that between the limits β, α there lies one, and only one, real root of the equation; ƒ(β) and ƒ(α) have therefore opposite signs. Suppose any intermediate value is θ; in order to determine by Sturm’s theorem whether the root lies between β, θ, or between θ, α, it would be quite unnecessary to calculate the signs of ƒ(θ),ƒ′(θ), ƒ2(θ) ...; only the sign of ƒ(θ) is required; for, if this has the same sign as ƒ(β), then the root is between β, θ; if the same sign as ƒ(α), then the root is between θ, α. We want to make θ increase from the inferior limit β, at which ƒ(θ) has the sign of ƒ(β), so long as ƒ(θ) retains this sign, and then to a value for which it assumes the opposite sign; we have thus two nearer limits of the required root, and the process may be repeated indefinitely.Horner’s method (1819) gives the root as a decimal, figure by figure; thus if the equation be known to have one real root between 0 and 10, it is in effect shown say that 5 is too small (that is, the root is between 5 and 6); next that 5.4 is too small (that is, the root is between 5.4 and 5.5); and so on to any number of decimals. Each figure is obtained, not by the successive trial of all the figures which precede it, but (as in the ordinary process of the extraction of a square root, which is in fact Horner’s process applied to this particular case) it is given presumptively as the first figure of a quotient; such value may be too large, and then the next inferior integer must be tried instead of it, or it may require to be further diminished. And it is to be remarked that the process not only gives the approximate value α of the root, but (as in the extraction of a square root) it includes the calculation of the function ƒ(α), which should be, and approximately is, = 0. The arrangement of the calculations is very elegant, and forms an integral part of the actual method. It is to be observed that after a certain number of decimal places have been obtained, a good many more can be found by a mere division. It is in the progress tacitly assumed that the roots have been first separated.Lagrange’s method (1767) gives the root as a continued fraction a + 1/b + 1/c + ..., where a is a positive or negative integer (which may be = 0), but b, c, ... are positive integers. Suppose the roots have been separated; then (by trial if need be of consecutive integer values) the limits may be made to be consecutive integer numbers: say they are a, a + 1; the value of x is therefore = a + 1/y, where y is positive and greater than 1; from the given equation for x, writing therein x = a + 1/y, we form an equation of the same order for y, and this equation will have one, and only one, positive root greater than 1; hence finding for it the limits b, b + 1 (where b is = or > 1), we have y = b + 1/z, where z is positive and greater than 1; and so on—that is, we thus obtain the successive denominators b, c, d ... of the continued fraction. The method is theoretically very elegant, but the disadvantage is that it gives the result in the form of a continued fraction, which for the most part must ultimately be converted into a decimal. There is one advantage in the method, that a commensurable root (that is, a root equal to a rational fraction) is found accurately, since, when such root exists, the continued fraction terminates.6. Newton’s method (1711), as perfected by Fourier(1831), may beroughly stated as follows. If x = γ be an approximate value of any root, and γ + h the correct value, then ƒ(γ + h) = 0, that is,ƒ(γ) +hƒ′(γ) +h²ƒ″(γ) + ... = 0;11·2and then, if h be so small that the terms after the second may be neglected, ƒ(γ) + hƒ′(γ) = 0, that is, h = {−ƒ(γ)/ƒ′(γ) }, or the new approximate value is x = γ − {ƒ(γ)/ƒ′(γ) }; and so on, as often as we please. It will be observed that so far nothing has been assumed as to the separation of the roots, or even as to the existence of a real root; γ has been taken as the approximate value of a root, but no precise meaning has been attached to this expression. The question arises, What are the conditions to be satisfied by γ in order that the process may by successive repetitions actually lead to a certain real root of the equation; or that, γ being an approximate value of a certain real root, the new value γ − {ƒ(γ)/ƒ′(γ) } may be a more approximate value.Fig.1.Referring to fig. 1, it is easy to see that if OC represent the assumed value γ, then, drawing the ordinate CP to meet the curve in P, and the tangent PC′ to meet the axis in C′, we shall have OC′ as the new approximate value of the root. But observe that there is here a real root OX, and that the curve beyond X is convex to the axis; under these conditions the point C′ is nearer to X than was C; and, starting with C′ instead of C, and proceeding in like manner to draw a new ordinate and tangent, and so on as often as we please, we approximate continually, and that with great rapidity, to the true value OX. But if C had been taken on the other side of X, where the curve is concave to the axis, the new point C′ might or might not be nearer to X than was the point C; and in this case the method, if it succeeds at all, does so by accident only,i.e.it may happen that C′ or some subsequent point comes to be a point C, such that CO is aproperapproximate value of the root, and then the subsequent approximations proceed in the same manner as if this value had been assumed in the first instance, all the preceding work being wasted. It thus appears that for the proper application of the method we requiremorethan the mere separation of the roots. In order to be able to approximate to a certain root α, = OX, we require to know that, between OX and some value ON, the curve is always convex to the axis (analytically, between the two values, ƒ(x) and ƒ″(x) must have always the same sign). When this is so, the point C may be taken anywhere on the proper side of X, and within the portion XN of the axis; and the process is then the one already explained. The approximation is in general a very rapid one. If we know for the required root OX the two limits OM, ON such that from M to X the curve is alwaysconcaveto the axis, while from X to N it is always convex to the axis,—then, taking D anywhere in the portion MX and (as before) C in the portion XN, drawing the ordinates DQ, CP, and joining the points P, Q by a line which meets the axis in D′, also constructing the point C′ by means of the tangent at P as before, we have for the required root the new limits OD′, OC′; and proceeding in like manner with the points D′, C′, and so on as often as we please, we obtain at each step two limits approximating more and more nearly to the required root OX. The process as to the point D′, translated into analysis, is the ordinate process of interpolation. Suppose OD = β, OC = α, we have approximately ƒ(β + h) = ƒ(β) + h{ƒ(α) − ƒ(β) } / (α − β), whence if the root is β + h then h = − (α − β)ƒ(β) / {ƒ(α) − ƒ(β) }.Returning for a moment to Horner’s method, it may be remarked that the correction h, to an approximate value α, is therein found as a quotient the same or such as the quotient ƒ(α) ÷ ƒ′(α) which presents itself in Newton’s method. The difference is that with Horner the integer part of this quotient is taken as the presumptive value of h, and the figure is verified at each step. With Newton the quotient itself, developed to the proper number of decimal places, is taken as the value of h; if too many decimals are taken, there would be a waste of work; but the error would correct itself at the next step. Of course the calculation should be conducted without any such waste of work.

Horner’s method (1819) gives the root as a decimal, figure by figure; thus if the equation be known to have one real root between 0 and 10, it is in effect shown say that 5 is too small (that is, the root is between 5 and 6); next that 5.4 is too small (that is, the root is between 5.4 and 5.5); and so on to any number of decimals. Each figure is obtained, not by the successive trial of all the figures which precede it, but (as in the ordinary process of the extraction of a square root, which is in fact Horner’s process applied to this particular case) it is given presumptively as the first figure of a quotient; such value may be too large, and then the next inferior integer must be tried instead of it, or it may require to be further diminished. And it is to be remarked that the process not only gives the approximate value α of the root, but (as in the extraction of a square root) it includes the calculation of the function ƒ(α), which should be, and approximately is, = 0. The arrangement of the calculations is very elegant, and forms an integral part of the actual method. It is to be observed that after a certain number of decimal places have been obtained, a good many more can be found by a mere division. It is in the progress tacitly assumed that the roots have been first separated.

Lagrange’s method (1767) gives the root as a continued fraction a + 1/b + 1/c + ..., where a is a positive or negative integer (which may be = 0), but b, c, ... are positive integers. Suppose the roots have been separated; then (by trial if need be of consecutive integer values) the limits may be made to be consecutive integer numbers: say they are a, a + 1; the value of x is therefore = a + 1/y, where y is positive and greater than 1; from the given equation for x, writing therein x = a + 1/y, we form an equation of the same order for y, and this equation will have one, and only one, positive root greater than 1; hence finding for it the limits b, b + 1 (where b is = or > 1), we have y = b + 1/z, where z is positive and greater than 1; and so on—that is, we thus obtain the successive denominators b, c, d ... of the continued fraction. The method is theoretically very elegant, but the disadvantage is that it gives the result in the form of a continued fraction, which for the most part must ultimately be converted into a decimal. There is one advantage in the method, that a commensurable root (that is, a root equal to a rational fraction) is found accurately, since, when such root exists, the continued fraction terminates.

6. Newton’s method (1711), as perfected by Fourier(1831), may beroughly stated as follows. If x = γ be an approximate value of any root, and γ + h the correct value, then ƒ(γ + h) = 0, that is,

and then, if h be so small that the terms after the second may be neglected, ƒ(γ) + hƒ′(γ) = 0, that is, h = {−ƒ(γ)/ƒ′(γ) }, or the new approximate value is x = γ − {ƒ(γ)/ƒ′(γ) }; and so on, as often as we please. It will be observed that so far nothing has been assumed as to the separation of the roots, or even as to the existence of a real root; γ has been taken as the approximate value of a root, but no precise meaning has been attached to this expression. The question arises, What are the conditions to be satisfied by γ in order that the process may by successive repetitions actually lead to a certain real root of the equation; or that, γ being an approximate value of a certain real root, the new value γ − {ƒ(γ)/ƒ′(γ) } may be a more approximate value.

Referring to fig. 1, it is easy to see that if OC represent the assumed value γ, then, drawing the ordinate CP to meet the curve in P, and the tangent PC′ to meet the axis in C′, we shall have OC′ as the new approximate value of the root. But observe that there is here a real root OX, and that the curve beyond X is convex to the axis; under these conditions the point C′ is nearer to X than was C; and, starting with C′ instead of C, and proceeding in like manner to draw a new ordinate and tangent, and so on as often as we please, we approximate continually, and that with great rapidity, to the true value OX. But if C had been taken on the other side of X, where the curve is concave to the axis, the new point C′ might or might not be nearer to X than was the point C; and in this case the method, if it succeeds at all, does so by accident only,i.e.it may happen that C′ or some subsequent point comes to be a point C, such that CO is aproperapproximate value of the root, and then the subsequent approximations proceed in the same manner as if this value had been assumed in the first instance, all the preceding work being wasted. It thus appears that for the proper application of the method we requiremorethan the mere separation of the roots. In order to be able to approximate to a certain root α, = OX, we require to know that, between OX and some value ON, the curve is always convex to the axis (analytically, between the two values, ƒ(x) and ƒ″(x) must have always the same sign). When this is so, the point C may be taken anywhere on the proper side of X, and within the portion XN of the axis; and the process is then the one already explained. The approximation is in general a very rapid one. If we know for the required root OX the two limits OM, ON such that from M to X the curve is alwaysconcaveto the axis, while from X to N it is always convex to the axis,—then, taking D anywhere in the portion MX and (as before) C in the portion XN, drawing the ordinates DQ, CP, and joining the points P, Q by a line which meets the axis in D′, also constructing the point C′ by means of the tangent at P as before, we have for the required root the new limits OD′, OC′; and proceeding in like manner with the points D′, C′, and so on as often as we please, we obtain at each step two limits approximating more and more nearly to the required root OX. The process as to the point D′, translated into analysis, is the ordinate process of interpolation. Suppose OD = β, OC = α, we have approximately ƒ(β + h) = ƒ(β) + h{ƒ(α) − ƒ(β) } / (α − β), whence if the root is β + h then h = − (α − β)ƒ(β) / {ƒ(α) − ƒ(β) }.

Returning for a moment to Horner’s method, it may be remarked that the correction h, to an approximate value α, is therein found as a quotient the same or such as the quotient ƒ(α) ÷ ƒ′(α) which presents itself in Newton’s method. The difference is that with Horner the integer part of this quotient is taken as the presumptive value of h, and the figure is verified at each step. With Newton the quotient itself, developed to the proper number of decimal places, is taken as the value of h; if too many decimals are taken, there would be a waste of work; but the error would correct itself at the next step. Of course the calculation should be conducted without any such waste of work.

Imaginary Theory.

7. It will be recollected that the expressionnumberand the correlative epithetnumericalwere at the outset used in a wide sense, as extending to imaginaries. This extension arises out of the theory of equations by a process analogous to that by which number, in its original most restricted sense of positive integer number, was extended to have the meaning of a real positive or negative magnitude susceptible of continuous variation.

If for a moment number is understood in its most restricted sense as meaning positive integer number, the solution of a simple equation leads to an extension; ax − b = 0 gives x = b/a, a positive fraction, and we can in this manner represent, not accurately, but as nearly as we please, any positive magnitude whatever; so an equation ax + b = 0 gives x = −b/a, which (approximately as before) represents any negative magnitude. We thus arrive at the extended signification of number as a continuously varying positive or negative magnitude. Such numbers may be added or subtracted, multiplied or divided one by another, and the result is always a number. Now from a quadric equation we derive, in like manner, the notion of a complex or imaginary number such as is spoken of above. The equation x² + 1 = 0 is not (in the foregoing sense, number = real number) satisfied by any numerical value whatever of x; but we assume that there is a number which we call i, satisfying the equation i² + 1 = 0, and then taking a and b any real numbers, we form an expression such as a + bi, and use the expression number in this extended sense: any two such numbers may be added or subtracted, multiplied or divided one by the other, and the result is always a number. And if we consider first a quadric equation x² + px + q = 0 where p and q are real numbers, and next the like equation, where p and q are any numbers whatever, it can be shown that there exists for x a numerical value which satisfies the equation; or, in other words, it can be shown that the equation has a numerical root. The like theorem, in fact, holds good for an equation of any order whatever; but suppose for a moment that this was not the case; say that there was a cubic equation x³ + px² + qx + r = 0, with numerical coefficients, not satisfied by any numerical value of x, we should have to establish a new imaginary j satisfying some such equation, and should then have to consider numbers of the form a + bj, or perhaps a + bj + cj² (a, b, c numbers α + βi of the kind heretofore considered),—first we should be thrown back on the quadric equation x² + px + q = 0, p and q being now numbers of the last-mentioned extended form—non constatthat every such equation has a numerical root—and if not, we might be led tootherimaginaries k, l, &c., and so onad infinitumin inextricable confusion.

But in fact a numerical equation of any order whatever has always a numerical root, and thus numbers (in the foregoing sense, number = quantity of the form α + βi) form (what real numbers do not) a universe complete in itself, such that starting in it we are never led out of it. There may very well be, and perhaps are, numbers in a more general sense of the term (quaternions are not a case in point, as the ordinary laws of combination are not adhered to), but in order to have to do with such numbers (if any) we must start with them.

8. The capital theorem as regards numerical equations thus is, every numerical equation has a numerical root; or for shortness (the meaning being as before), every equation has a root. Of course the theorem is the reverse of self-evident, and it requires proof; but provisionally assuming it as true, we derive from it the general theory of numerical equations. As the term root was introduced in the course of an explanation, it will be convenient to give here the formal definition.

A number a such that substituted for x it makes the function x1n− p1xn−1... ± pnto be = 0, or say such that it satisfies the equation ƒ(x) = 0, is said to be a root of the equation; that is, a being a root, we havean− p1an−1... ± pn= 0, or say ƒ(a) = 0;and it is then easily shown that x − a is a factor of the function ƒ(x), viz. that we have ƒ(x) = (x − a)ƒ1(x), where ƒ1(x) is a function xn−1− q1xn−2... ± qn−1of the order n − 1, with numerical coefficients q1, q2... qn−1.In general a is not a root of the equation ƒ1(x) = 0, but it may be so—i.e.ƒ1(x) may contain the factor x − a; when this is so, ƒ(x) will contain the factor (x − a)²; writing then ƒ(x) = (x − a)²ƒ2(x), and assuming that a is not a root of the equation ƒ2(x) = 0, x = a is then said tobe a double root of the equation ƒ(x) = 0; and similarly ƒ(x) may contain the factor (x − a)³ and no higher power, and x = a is then a triple root; and so on.Supposing in general that ƒ(x) = (x − a)αF(x) (α being a positive integer which may be = 1, (x − a)αthe highest power of x − a which divides ƒ(x), and F(x) being of course of the order n − α), then the equation F(x) = 0 will have a root b which will be different from a; x − b will be a factor, in general a simple one, but it may be a multiple one, of F(x), and ƒ(x) will in this case be = (x − a)α(x − b)βΦ(x) (β a positive integer which may be = 1, (x − b)βthe highest power of x − b in F(x) or ƒ(x), and Φ(x) being of course of the order n − α − β). The original equation ƒ(x) = 0 is in this case said to have α roots each = a, β roots each = b; and so on for any other factors (x − c)γ, &c.We have thus thetheorem—A numerical equation of the order n has in every case n roots, viz. there exist n numbers, a, b, ... (in general all distinct, but which may arrange themselves in any sets of equal values), such that ƒ(x) = (x − a)(x − b)(x − c) ... identically.If the equation has equal roots, these can in general be determined, and the case is at any rate a special one which may be in the first instance excluded from consideration. It is, therefore, in general assumed that the equation ƒ(x) = 0 has all its roots unequal.If the coefficients p1, p2, ... are all or any one or more of them imaginary, then the equation ƒ(x) = 0, separating the real and imaginary parts thereof, may be written F(x) + iΦ(x) = 0, where F(x), Φ(x) are each of them a function with real coefficients; and it thus appears that the equation ƒ(x) = 0, with imaginary coefficients, has not in general any real root; supposing it to have a real root a, this must be at once a root of each of the equations F(x) = 0 and Φ(x) = 0.But an equation with real coefficients may have as well imaginary as real roots, and we have further thetheoremthat for any such equation the imaginary roots enter in pairs, viz. α + βi being a root, then α − βi will be also a root. It follows that if the order be odd, there is always an odd number of real roots, and therefore at least one real root.

an− p1an−1... ± pn= 0, or say ƒ(a) = 0;

and it is then easily shown that x − a is a factor of the function ƒ(x), viz. that we have ƒ(x) = (x − a)ƒ1(x), where ƒ1(x) is a function xn−1− q1xn−2... ± qn−1of the order n − 1, with numerical coefficients q1, q2... qn−1.

In general a is not a root of the equation ƒ1(x) = 0, but it may be so—i.e.ƒ1(x) may contain the factor x − a; when this is so, ƒ(x) will contain the factor (x − a)²; writing then ƒ(x) = (x − a)²ƒ2(x), and assuming that a is not a root of the equation ƒ2(x) = 0, x = a is then said tobe a double root of the equation ƒ(x) = 0; and similarly ƒ(x) may contain the factor (x − a)³ and no higher power, and x = a is then a triple root; and so on.

Supposing in general that ƒ(x) = (x − a)αF(x) (α being a positive integer which may be = 1, (x − a)αthe highest power of x − a which divides ƒ(x), and F(x) being of course of the order n − α), then the equation F(x) = 0 will have a root b which will be different from a; x − b will be a factor, in general a simple one, but it may be a multiple one, of F(x), and ƒ(x) will in this case be = (x − a)α(x − b)βΦ(x) (β a positive integer which may be = 1, (x − b)βthe highest power of x − b in F(x) or ƒ(x), and Φ(x) being of course of the order n − α − β). The original equation ƒ(x) = 0 is in this case said to have α roots each = a, β roots each = b; and so on for any other factors (x − c)γ, &c.

We have thus thetheorem—A numerical equation of the order n has in every case n roots, viz. there exist n numbers, a, b, ... (in general all distinct, but which may arrange themselves in any sets of equal values), such that ƒ(x) = (x − a)(x − b)(x − c) ... identically.

If the equation has equal roots, these can in general be determined, and the case is at any rate a special one which may be in the first instance excluded from consideration. It is, therefore, in general assumed that the equation ƒ(x) = 0 has all its roots unequal.

If the coefficients p1, p2, ... are all or any one or more of them imaginary, then the equation ƒ(x) = 0, separating the real and imaginary parts thereof, may be written F(x) + iΦ(x) = 0, where F(x), Φ(x) are each of them a function with real coefficients; and it thus appears that the equation ƒ(x) = 0, with imaginary coefficients, has not in general any real root; supposing it to have a real root a, this must be at once a root of each of the equations F(x) = 0 and Φ(x) = 0.

But an equation with real coefficients may have as well imaginary as real roots, and we have further thetheoremthat for any such equation the imaginary roots enter in pairs, viz. α + βi being a root, then α − βi will be also a root. It follows that if the order be odd, there is always an odd number of real roots, and therefore at least one real root.

9. In the case of an equation with real coefficients, the question of the existence of real roots, and of their separation, has been already considered. In the general case of an equation with imaginary (it may be real) coefficients, the like question arises as to the situation of the (real or imaginary) roots; thus, if for facility of conception we regard the constituents α, β of a root α + βi as the co-ordinates of a pointin plano, and accordingly represent the root by such point, then drawing in the plane any closed curve or “contour,” the question is how many roots lie within such contour.

This is solved theoretically by means of a theorem of A.L. Cauchy (1837), viz. writing in the original equation x + iy in place of x, the function ƒ(x + iy) becomes = P + iQ, where P and Q are each of them a rational and integral function (with real coefficients) of (x, y). Imagining the point (x, y) to travel along the contour, and considering the number of changes of sign from − to + and from + to − of the fraction corresponding to passages of the fraction through zero (that is, to values for which P becomes = 0, disregarding those for which Q becomes = 0), the difference of these numbers gives the number of roots within the contour.It is important to remark that the demonstration does not presuppose the existence of any root; the contour may be the infinity of the plane (such infinity regarded as a contour, or closed curve), and in this case it can be shown (and that very easily) that the difference of the numbers of changes of sign is = n; that is, there are within the infinite contour, or (what is the same thing) there are in all n roots; thus Cauchy’s theorem contains really the proof of the fundamental theorem that a numerical equation of the nth order (not only has a numerical root, but) has precisely n roots. It would appear that this proof of the fundamental theorem in its most complete form is in principle identical with the last proof of K.F. Gauss (1849) of the theorem, in the form—A numerical equation of the nth order has always a root.3But in the case of a finite contour, the actual determination of the difference which gives the number of real roots can be effected only in the case of a rectangular contour, by applying to each of its sides separately a method such as that of Sturm’s theorem; and thus the actual determination ultimately depends on a method such as that of Sturm’s theorem.Very little has been done in regard to the calculation of the imaginary roots of an equation by approximation; and the question is not here considered.

It is important to remark that the demonstration does not presuppose the existence of any root; the contour may be the infinity of the plane (such infinity regarded as a contour, or closed curve), and in this case it can be shown (and that very easily) that the difference of the numbers of changes of sign is = n; that is, there are within the infinite contour, or (what is the same thing) there are in all n roots; thus Cauchy’s theorem contains really the proof of the fundamental theorem that a numerical equation of the nth order (not only has a numerical root, but) has precisely n roots. It would appear that this proof of the fundamental theorem in its most complete form is in principle identical with the last proof of K.F. Gauss (1849) of the theorem, in the form—A numerical equation of the nth order has always a root.3

But in the case of a finite contour, the actual determination of the difference which gives the number of real roots can be effected only in the case of a rectangular contour, by applying to each of its sides separately a method such as that of Sturm’s theorem; and thus the actual determination ultimately depends on a method such as that of Sturm’s theorem.

Very little has been done in regard to the calculation of the imaginary roots of an equation by approximation; and the question is not here considered.

10. A class of numerical equations which needs to be considered is that of the binomial equations xn− a = 0 (a = α + βi, a complex number).

The foregoing conclusions apply, viz. there are always n roots, which, it may be shown, are all unequal. And these can be found numerically by the extraction of the square root, and of an nth root, ofrealnumbers, and by the aid of a table of natural sines and cosines.4For writingα + βi = √(α² + β²){α+βi},√(α² + β²)√(α² + β²)there is always a real angle λ (positive and less than 2π), such that its cosine and sine are = α / √(α² + β²) and β / √(α² + β²) respectively; that is, writing for shortness √(α² + β²) = ρ, we have α + βi = ρ (cos λ + i sin λ), or the equation is xn= ρ (cos λ + i sin λ); hence observing that (cos λ/n + i sin λ/n )n= cos λ + i sin λ, a value of x is =n√ρ (cos λ/n + i sin λ/n). The formula really gives all the roots, for instead of λ we may write λ + 2sπ, s a positive or negative integer, and then we havex =n√ρ(cosλ + 2sπ+ i sinλ + 2sπ),nnwhich has the n values obtained by giving to s the values 0, 1, 2 ... n − 1 in succession; the roots are, it is clear, represented by points lying at equal intervals on a circle. But it is more convenient to proceed somewhat differently; taking one of the roots to be θ, so that θn= a, then assuming x = θy, the equation becomes yn− 1 = 0, which equation, like the original equation, has precisely n roots (one of them being of course = 1). And the original equation xn− a = 0 is thus reduced to the more simple equation xn− 1 = 0; and although the theory of this equation is included in the preceding one, yet it is proper to state it separately.The equation xn− 1 = 0 has its several roots expressed in the form 1, ω, ω², ... ωn−1, where ω may be taken = cos 2π/n + i sin 2π/n; in fact, ω having this value, any integer power ωkis = cos 2πk/n + i sin 2πk/n, and we thence have (ωk)n= cos 2πk + i sin 2πk, = 1, that is, ωkis a root of the equation. The theory will be resumed further on.By what precedes, we are led to the notion (a numerical) of the radical a1/nregarded as an n-valued function; any one of these being denoted byn√a, then the series of values isn√a, ωn√a, ... ωn−1n√a; or we may, if we please, usen√a instead of a1/nas a symbol to denote the n-valued function.As the coefficients of an algebraical equation may be numerical, all which follows in regard to algebraical equations is (with, it may be, some few modifications) applicable to numerical equations; and hence, concluding for the present this subject, it will be convenient to pass on to algebraical equations.

there is always a real angle λ (positive and less than 2π), such that its cosine and sine are = α / √(α² + β²) and β / √(α² + β²) respectively; that is, writing for shortness √(α² + β²) = ρ, we have α + βi = ρ (cos λ + i sin λ), or the equation is xn= ρ (cos λ + i sin λ); hence observing that (cos λ/n + i sin λ/n )n= cos λ + i sin λ, a value of x is =n√ρ (cos λ/n + i sin λ/n). The formula really gives all the roots, for instead of λ we may write λ + 2sπ, s a positive or negative integer, and then we have

which has the n values obtained by giving to s the values 0, 1, 2 ... n − 1 in succession; the roots are, it is clear, represented by points lying at equal intervals on a circle. But it is more convenient to proceed somewhat differently; taking one of the roots to be θ, so that θn= a, then assuming x = θy, the equation becomes yn− 1 = 0, which equation, like the original equation, has precisely n roots (one of them being of course = 1). And the original equation xn− a = 0 is thus reduced to the more simple equation xn− 1 = 0; and although the theory of this equation is included in the preceding one, yet it is proper to state it separately.

The equation xn− 1 = 0 has its several roots expressed in the form 1, ω, ω², ... ωn−1, where ω may be taken = cos 2π/n + i sin 2π/n; in fact, ω having this value, any integer power ωkis = cos 2πk/n + i sin 2πk/n, and we thence have (ωk)n= cos 2πk + i sin 2πk, = 1, that is, ωkis a root of the equation. The theory will be resumed further on.

By what precedes, we are led to the notion (a numerical) of the radical a1/nregarded as an n-valued function; any one of these being denoted byn√a, then the series of values isn√a, ωn√a, ... ωn−1n√a; or we may, if we please, usen√a instead of a1/nas a symbol to denote the n-valued function.

As the coefficients of an algebraical equation may be numerical, all which follows in regard to algebraical equations is (with, it may be, some few modifications) applicable to numerical equations; and hence, concluding for the present this subject, it will be convenient to pass on to algebraical equations.

Algebraical Equations.

11. The equation is

xn− p1xn−1+ ... ± pn= 0,

and we hereassumethe existence of roots, viz. we assume that there are n quantities a, b, c ... (in general all of them different, but which in particular cases may become equal in sets in any manner), such that

xn− p1xn−1+ ... ± pn= 0;

or looking at the question in a different point of view, and starting with the roots a, b, c ... as given, we express the product of the n factors x − a, x − b, ... in the foregoing form, and thus arrive at an equation of the order n having the n roots a, b, c.... In either case we have

p1= Σa, p2= Σab, ... pn= abc...;

i.e.regarding the coefficients p1, p2... pnas given, then we assume the existence of roots a, b, c, ... such that p1= Σa, &c.; or, regarding the roots as given, then we write p1, p2, &c., to denote the functions Σa, Σab, &c.

As already explained, the epithet algebraical is not used in opposition to numerical; an algebraical equation is merely an equation wherein the coefficients are not restricted to denote, or are not explicitly considered as denoting, numbers. That the abstraction is legitimate, appears by the simplest example; in saying that the equation x² − px + q = 0 has a root x = ½ {p + √(p² − 4q) }, we mean that writing this value for x the equation becomes an identity, [½ {p + √(p² − 4q) }]² − p[½ {p + √(p² − 4q) }] + q = 0; and the verification of this identity in nowise depends upon p and q meaning numbers. But if it be asked what there is beyond numerical equations included in the term algebraical equation, or, again, what is the full extent of the meaning attributed to the term—the latter question at anyrate it would be very difficult to answer; as to the former one, it may be said that the coefficients may, for instance, be symbols of operation. As regards such equations, there is certainly no proof that every equation has a root, or that an equation of the nth order has n roots; nor is it in any wise clear what the precise signification of the statement is. But it is found that the assumption of the existence of the n roots can be made without contradictory results; conclusions derived from it, if they involve the roots, rest on the same ground as the original assumption; but the conclusion may be independent of the roots altogether, and in this case it is undoubtedly valid; the reasoning, although actually conducted by aid of the assumption (and, it may be, most easily and elegantly in this manner), is really independent of the assumption. In illustration, we observe that it is allowable to express a function of p and q as follows,—that is, by means of a rational symmetrical function of a and b, this can, as a fact, be expressed as a rational function of a + b and ab; and if we prescribe that a + b and ab shall then be changed into p and q respectively, we have the required function of p, q. That is, we have F(α, β) as a representation of ƒ(p, q), obtained as if we had p = a + b, q = ab, but without in any wise assuming the existence of the a, b of these equations.

12. Starting from the equation

xn− p1xn−1+ ... = x − a·x − b. &c.

or the equivalent equations p1= Σa, &c., we find

an− p1an−1+ ... = 0,bn− p1bn−1+ ... = 0;· · ·· · ·· · ·

(it is as satisfying these equations that a, b ... are said to be the roots of xn− p1xn−1+ ... = 0); and conversely from the last-mentioned equations, assuming that a, b ... are all different, we deduce

p1= Σa, p2= Σab, &c.

and

xn− p1xn−1+ ... = x − a·x − b. &c.

Observe that if, for instance, a = b, then the equations an− p1an−1+ ... = 0, bn− p1bn−1+ ... = 0 would reduce themselves to a single relation, which would not of itself express that a was a double root,—that is, that (x − a)² was a factor of xn− p1xn−1+, &c; but by considering b as the limit of a + h, h indefinitely small, we obtain a second equation

nan−1− (n − 1) p1an−2+ ... = 0,

which, with the first, expresses that a is a double root; and then the whole system of equations leads as before to the equations p1= Σa, &c. But the existence of a double root implies a certain relation between the coefficients; the general case is when the roots are all unequal.

We have then thetheoremthat every rational symmetrical function of the roots is a rational function of the coefficients. This is an easy consequence from the less general theorem, every rational and integral symmetrical function of the roots is a rational and integral function of the coefficients.

In particular, the sums of the powers Σa², Σa³, &c., are rational and integral functions of the coefficients.

The process originally employed for the expression of other functions Σaαbβ, &c., in terms of the coefficients is to make them depend upon the sums of powers: for instance, Σaαbβ= ΣaαΣaβ− Σaα+β; but this is very objectionable; the true theory consists in showing that we have systems of equationsp1= Σa,p2= Σab,p1²= Σa² + 2Σab,p3= Σabc,p1p2= Σa²b + 3Σabc,p1³= Σa³ + 3Σa²b + 6Σabc,where in each system there are precisely as many equations as there are root-functions on the right-hand side—e.g.3 equations and 3 functions Σabc, Σa²b, Σa³. Hence in each system the root-functions can be determined linearly in terms of the powers and products of the coefficients:Σab= p2,Σa²= p1² − 2p2,Σabc= p3,Σa²b= p1p2− 3p3,Σa³= p1³ − 3p1p2+ 3p3,and so on. The other process, if applied consistently, would derive the originally assumed value Σab = p2, from the two equations Σa = p, Σa² = p1² − 2p2;i.e.we have 2Σab = Σa·Σa − Σa²,= p1² − (p1² − 2p2), = 2p2.

where in each system there are precisely as many equations as there are root-functions on the right-hand side—e.g.3 equations and 3 functions Σabc, Σa²b, Σa³. Hence in each system the root-functions can be determined linearly in terms of the powers and products of the coefficients:

and so on. The other process, if applied consistently, would derive the originally assumed value Σab = p2, from the two equations Σa = p, Σa² = p1² − 2p2;i.e.we have 2Σab = Σa·Σa − Σa²,= p1² − (p1² − 2p2), = 2p2.

13. It is convenient to mention here the theorem that, x being determined as above by an equation of the order n, any rational and integral function whatever of x, or more generally any rational function which does not become infinite in virtue of the equation itself, can be expressed as a rational and integral function of x, of the order n − 1, the coefficients being rational functions of the coefficients of the equation. Thus the equation gives xna function of the form in question; multiplying each side by x, and on the right-hand side writing for xnits foregoing value, we have xn+1, a function of the form in question; and the like for any higher power of x, and therefore also for any rational and integral function of x. The proof in the case of a rational non-integral function is somewhat more complicated. The final result is of the form φ(x)/ψ(x) = I(x), or say φ(x) − ψ(x)I(x) = 0, where φ, ψ, I are rational and integral functions; in other words, this equation, being true if only ƒ(x) = 0, can only be so by reason that the left-hand side contains ƒ(x) as a factor, or we must have identically φ(x) − ψ(x)I(x) = M(x)ƒ(x). And it is, moreover, clear that the equation φ(x)/ψ(x) = I(x), being satisfied if only ƒ(x) = 0, must be satisfied by each root of the equation.

From the theorem that a rational symmetrical function of the roots is expressible in terms of the coefficients, it at once follows that it is possible to determine an equation (of an assignable order) having for its roots the several values of any given (unsymmetrical) function of the roots of the given equation. For example, in the case of a quartic equation, roots (a, b, c, d), it is possible to find an equation having the roots ab, ac, ad, bc, bd, cd (being therefore a sextic equation): viz. in the product(y − ab) (y − ac) (y − ad) (y − bc) (y − bd) (y − cd)the coefficients of the several powers of y will be symmetrical functions of a, b, c, d and therefore rational and integral functions of the coefficients of the quartic equation; hence, supposing the product so expressed, and equating it to zero, we have the required sextic equation. In the same manner can be found the sextic equation having the roots (a − b)², (a − c)², (a − d)², (b − c)², (b − d)², (c − d)², which is the equation of differences previously referred to; and similarly we obtain the equation of differences for a given equation of any order. Again, the equation sought for may be that having for its n roots the given rational functions φ(a), φ(b), ... of the several roots of the given equation. Any such rational function can (as was shown) be expressed as a rational and integral function of the order n − 1; and, retaining x in place of any one of the roots, the problem is to find y from the equations xn− p1xn−1... = 0, and y = M0xn−1+ M1xn−2+ ..., or, what is the same thing, from these two equations to eliminate x. This is in fact E.W. Tschirnhausen’s transformation (1683).

(y − ab) (y − ac) (y − ad) (y − bc) (y − bd) (y − cd)

the coefficients of the several powers of y will be symmetrical functions of a, b, c, d and therefore rational and integral functions of the coefficients of the quartic equation; hence, supposing the product so expressed, and equating it to zero, we have the required sextic equation. In the same manner can be found the sextic equation having the roots (a − b)², (a − c)², (a − d)², (b − c)², (b − d)², (c − d)², which is the equation of differences previously referred to; and similarly we obtain the equation of differences for a given equation of any order. Again, the equation sought for may be that having for its n roots the given rational functions φ(a), φ(b), ... of the several roots of the given equation. Any such rational function can (as was shown) be expressed as a rational and integral function of the order n − 1; and, retaining x in place of any one of the roots, the problem is to find y from the equations xn− p1xn−1... = 0, and y = M0xn−1+ M1xn−2+ ..., or, what is the same thing, from these two equations to eliminate x. This is in fact E.W. Tschirnhausen’s transformation (1683).

14. In connexion with what precedes, the question arises as to the number of values (obtained by permutations of the roots) of given unsymmetrical functions of the roots, or say of a given set of letters: for instance, with roots or letters (a, b, c, d) as before, how many values are there of the function ab + cd, or better, how many functions are there of this form? The answer is 3, viz. ab + cd, ac + bd, ad + bc; or again we may ask whether, in the case of a given number of letters, there exist functions with a given number of values, 3-valued, 4-valued functions, &c.

It is at once seen that for any given number of letters there exist 2-valued functions; the product of the differences of the letters is such a function; however the letters are interchanged, it alters only its sign; or say the two values are Δ and −Δ. And if P, Q are symmetrical functions of the letters, then the general form of such a function is P + QΔ; this has only the two values P + QΔ, P − QΔ.In the case of 4 letters there exist (as appears above) 3-valued functions: but in the case of 5 letters there does not exist any 3-valued or 4-valued function; and the only 5-valued functions are those which are symmetrical in regard to four of the letters, and can thus be expressed in terms of one letter and of symmetrical functions of all the letters. These last theorems present themselves in the demonstration of the non-existence of a solution of a quintic equation by radicals.

In the case of 4 letters there exist (as appears above) 3-valued functions: but in the case of 5 letters there does not exist any 3-valued or 4-valued function; and the only 5-valued functions are those which are symmetrical in regard to four of the letters, and can thus be expressed in terms of one letter and of symmetrical functions of all the letters. These last theorems present themselves in the demonstration of the non-existence of a solution of a quintic equation by radicals.

The theory is an extensive and important one, depending on the notions ofsubstitutionsand ofgroups(q.v.).

15. Returning to equations, we have the very important theorem that, given the value of any unsymmetrical function of the roots,e.g.in the case of a quartic equation, the function ab + cd, it is in general possible to determine rationally the value of any similar function, such as (a + b)³ + (c + d)³.

Thea prioriground of this theorem may be illustrated by means of a numerical equation. Suppose that the roots of a quartic equation are 1, 2, 3, 4, then if it is given that ab + cd = 14, this in effect determines a, b to be 1, 2 and c, d to be 3, 4 (viz. a = 1, b = 2 or a = 2, b = 1,and c = 3, d = 4 or c = 3, d = 4) or else a, b to be 3, 4 and c, d to be 1, 2; and it therefore in effect determines (a + b)³ + (c + d)³ to be = 370, and not any other value; that is, (a + b)³ + (c + d)³, as having a single value, must be determinable rationally. And we can in the same way account for cases of failure as regards particular equations; thus, the roots being 1, 2, 3, 4 as before, a²b = 2 determines a to be = 1 and b to be = 2, but if the roots had been 1, 2, 4, 16 then a²b = 16 does not uniquely determine a, b but only makes them to be 1, 16 or 2, 4 respectively.As to thea posterioriproof, assume, for instance,t1= ab + cd, y1= (a + b)³ + (c + d)³,t2= ac + bd, y2= (a + c)³ + (b + d)³,t3= ad + bc, y3= (a + d)³ + (b + c)³;then y1+ y2+ y3, t1y1+ t2y2+ t3y3, t1²y1+ t2²y2+ t3²y3will be respectively symmetrical functions of the roots of the quartic, and therefore rational and integral functions of the coefficients; that is, they will be known.Suppose for a moment that t1, t2, t3are all known; then the equations being linear in y1, y2, y3these can be expressed rationally in terms of the coefficients and of t1, t2, t3; that is, y1, y2, y3will be known. But observe further that y1is obtained as a function of t1, t2, t3symmetrical as regards t2, t3; it can therefore be expressed as a rational function of t1and of t2+ t3, t2t3, and thence as a rational function of t1and of t1+ t2+ t3, t1t2+ t1t3+ t2t3, t1t2t3; but these last are symmetrical functions of the roots, and as such they are expressible rationally in terms of the coefficients; that is, y1will be expressed as a rational function of t1and of the coefficients; or t1(alone, not t2or t3) being known, y1will be rationally determined.

As to thea posterioriproof, assume, for instance,

t1= ab + cd, y1= (a + b)³ + (c + d)³,t2= ac + bd, y2= (a + c)³ + (b + d)³,t3= ad + bc, y3= (a + d)³ + (b + c)³;

then y1+ y2+ y3, t1y1+ t2y2+ t3y3, t1²y1+ t2²y2+ t3²y3will be respectively symmetrical functions of the roots of the quartic, and therefore rational and integral functions of the coefficients; that is, they will be known.

Suppose for a moment that t1, t2, t3are all known; then the equations being linear in y1, y2, y3these can be expressed rationally in terms of the coefficients and of t1, t2, t3; that is, y1, y2, y3will be known. But observe further that y1is obtained as a function of t1, t2, t3symmetrical as regards t2, t3; it can therefore be expressed as a rational function of t1and of t2+ t3, t2t3, and thence as a rational function of t1and of t1+ t2+ t3, t1t2+ t1t3+ t2t3, t1t2t3; but these last are symmetrical functions of the roots, and as such they are expressible rationally in terms of the coefficients; that is, y1will be expressed as a rational function of t1and of the coefficients; or t1(alone, not t2or t3) being known, y1will be rationally determined.

16. We now consider the question of the algebraical solution of equations, or, more accurately, that of thesolution of equations by radicals.

Back to Index Next