Rotation-Quasi-Invariant Quadratic and Other Power Mean Theories in Fundamental Sciences of Estimation, Approximation, Data Modeling and Processing

by

© Ph. D. & Dr. Sc. Lev Gelimson

Academic Institute for Creating Fundamental Sciences (Munich, Germany)

Mathematical Journal

of the "Collegium" All World Academy of Sciences

Munich (Germany)

11 (2011), 20

To solving contradictory (e.g., overdetermined) problems in approximation and data processing, the least square method (LSM) [1] by Legendre and Gauss only usually applies. Overmathematics [2, 3] and fundamental sciences of estimation [4], approximation [5], data modeling [6] and processing [7] have discovered a lot of principal shortcomings [2-8] of this method. Additionally, minimizing the sum of the squared differences of the alone preselected coordinates (e.g. ordinates in a two-dimensional problem) of the graph of the desired approximation function and of everyone among the given data depends on this preselection, ignores the remaining coordinates, and provides no objective sense of the result. Moreover, the method is correct in the unique case of a constant approximation only and gives systematic errors increasing together with the declination of an approximation function.

In fundamental sciences of estimation [4], approximation [5], data modeling [6] and processing [7], power mean theories (PMT) naturally generalize quadratic mean theories (QMT) and are valid by coordinate system linear transformation invariance of the given data, too. Show the essence of power mean theories by a linear approximation (bisector) in the two-dimensional case.

Given n (n ∈ N+ = {1, 2, ...}, n > 2) points [j=1n (x'j , y'j )] = {(x'1 , y'1), (x'2 , y'2), ... , (x'n , y'n)] with any real coordinates. Use clearly invariant centralization transformation x = x' - Σj=1n x'j / n , y = y' - Σj=1n y'j / n to provide coordinate system xOy central for the given data and further work in this system with points [j=1n (xj , yj)] to be approximated with a straight line y = ax containing origin O(0, 0).

Quadratic mean theories give

a = (Σj=1n yj2 / Σj=1n xj2)1/2 sign Σj=1n xjyj ,

y = sign Σj=1n xjyjj=1n yj2 / Σj=1n xj2)1/2 x

for the transformed centralized data, whereas for the initial noncentralized data we obtain

y' = sign Σj=1n (x'j - Σj=1n x'j / n)(y'j - Σj=1n y'j / n) [Σj=1n (y'j - Σj=1n y'j / n)2/ Σj=1n (x'j - Σj=1n x'j / n)2]1/2(x'j - Σj=1n x'j / n) + Σj=1n y'j / n .

Power mean theories naturally give by any power p > 0

a = (Σj=1n |yj|p / Σj=1n |xj|p)1/p sign Σj=1n xjyj ,

y = sign Σj=1n xjyjj=1n |yj|p / Σj=1n |xj|p)1/p x

for the transformed centralized data, whereas for the initial noncentralized data we obtain

y' = sign Σj=1n (x'j - Σj=1n x'j / n)(y'j - Σj=1n y'j / n) [Σj=1n |y'j - Σj=1n y'j / n|p/ Σj=1n |x'j - Σj=1n x'j / n|p]1/p(x'j - Σj=1n x'j / n) + Σj=1n y'j / n .

Nota bene:

1. Using moduli (absolute values) provides possibility to consider any power p > 0. The above sign remains due to coordinate system linear transformation invariance of the given data.

2. We can keep here the 2nd power because we consider here that sign only. The above sign remains due to coordinate system linear transformation invariance of the given data.

By nonzero but relatively very small absolute values of Σj=1n xjyj , namely by |Σj=1n xjyj| << (Σj=1n xj2 Σj=1n yj2)1/2, the used sign can become oversensitive to small data variations. In such a case, use either horizontal y = 0 (y' = Σj=1n y'j / n) by 2yS = Σj=1nyj2 - (Σj=1nyj)2/n < 2xS = Σj=1nxj2 - (Σj=1nxj)2/n or vertical x = 0 (x' = Σj=1n x'j / n) by 2yS > 2xS straight line approximation. The last line cannot be obtained by general equation y = ax and has to be considered separately.

After centralization, additionally introduce normalization transformation for any power p > 0

X = x/(Σj=1n |xj|p)1/p ,

Y = y/(Σj=1n |yj|p)1/p

to provide coordinate system XOY central normalized for the given data and further work in this system with points [j=1n (Xj , Yj)] to be approximated with a straight line Y = AX containing origin O(0, 0). Note that y = ax gives

Y(Σj=1n |yj|p)1/p = a(Σj=1n |xj|p)1/p X

and hence

A = (Σj=1n |xj|p / Σj=1n |yj|p)1/p a ,

a = (Σj=1n |yj|p / Σj=1n |xj|p)1/p A .

Note that, due to normalization,

Σj=1n |Xj|p = Σj=1n [|xj| / (Σj=1n |xj|p)1/p]p = 1,

Σj=1n |Yj|p = Σj=1n [|yj| / (Σj=1n |yj|p)1/p]p = 1.

In quadratic mean theories (QMT) we have obtained two solutions only

A1 = 1,

A2 = - 1

which both are fully independent of a power and even of the specific data in their initial and transformed forms. Therefore, we can conside these solutions in our natural generalization of quadratic mean theories (QMT), too.

Consequently determine

pS(A1) = Σj=1n |Yj - Xj|p ,

pS(A2) = Σj=1n |Yj + Xj|p ,

pSmin(A) = min{pS(A1), pS(A2)},

pSmax(A) = max{pS(A1), pS(A2)},

Note that theoretically by

Σj=1n XjYj = 0,

practically by

j=1n XjYj| << 1,

we have to investigate the pair of straight lines Y = 0 and X = 0.

Otherwise, we have Y = X and Y = - X obtained above by a ≠ 0 and A ≠ 0.

Check whether namely

A = sign(Σj=1n XjYj)

provides

pSmin(A) = Σj=1n |Yj - sign(Σj=1n XjYj) Xj|p ,

whereas

A = - sign(Σj=1n XjYj)

provides

pSmax(A) = Σj=1n |Yj + sign(Σj=1n XjYj) Xj|p.

Then define and determine

1|pS = [pSmin(A) / pSmax(A)]1/p

as a measure of data scatter with respect to linear approximation and power p .

This is an upper estimation of data scatter with respect to approximation at all because nonlinear approximation is also possible.

Denote a measure of data scatter with respect to approximation at all with S . Then 1|pS ≥ S .

Also introduce a measure of data trend with respect to linear approximation and power p

1|pT = 1 - 1|pS = 1 - [pSmin(A) / pSmax(A)]1/p

and a measure of data trend with respect to approximation at all

T = 1 - S .

Then, naturally, 1|pT ≤ T .

But this approach is not universal enough because such a dependence of a and y on Σj=1n xjyj is not continuous due to the jumps of the function sign z at z = 0 both to the left and to the right.

It is possible to give still more universal (but much more complicated) formulae for a and y . Namely, denote

t = |Σj=1n xjyj|/(Σj=1n xj2 Σj=1n yj2)1/2

to provide continuity due to

sign Σj=1n xjyj = sign [Σj=1n xjyj /(Σj=1n xj2 Σj=1n yj2)1/2],

Σj=1n xjyj = |Σj=1n xjyj| sign Σj=1n xjyj = sign [Σj=1n xjyj /(Σj=1n xj2 Σj=1n yj2)1/2] |Σj=1n xjyj|

via multiplying the above values of a and y with tr whose power r(t) is a suitable function of t .

Nota bene: 0 ≤ t ≤ 1 [1].

A deep analysis of obtaining the formulae for a and y in quadratic mean theories, namely including expression Σj=1n xjyj at power 1 by small values of ratio Σj=1n yj2 / Σj=1n xj2 , at power 0 by Σj=1n yj2 / Σj=1n xj2 = 1, and at power -1 by great values of this ratio Σj=1n yj2 / Σj=1n xj2 leads to idea

u = 1 - 4/π arctan[(Σj=1n yj2 / Σj=1n xj2)1/2],

r = 1 - 4/π arctan[(Σj=1n yj2 / Σj=1n xj2)1/2 tu]

with still better results by coinciding the physical dimensions (units) of x and y .

Nota bene: 0 ≤ t ≤ 1 [1] but, generally, variable t cannot take value 1. A deep analysis shows that the greatest value of t is

tmax = |Σj=1n xj2(0) - Σj=1n yj2(0)|/[Σj=1n xj2(0) + Σj=1n yj2(0)]

where data rotation angle 0 relates to the initial data (before rotation) in the principal central coordinate system with

Σj=1n xj(0)yj(0) = 0.

Therefore, for any data rotation angle β , we can also take

t(β) = |Σj=1n xj(β)yj(β)|/[Σj=1n xj2(β) Σj=1n yj2(β)]1/2j=1n xj2(0) + Σj=1n yj2(0)]/|Σj=1n xj2(0) - Σj=1n yj2(0)|,

r(β) = sign[Σj=1n xj2(β) - Σj=1n yj2(β)] 2/π arctan[1/t(β) - 1],

or

r(β) = sign[Σj=1n xj2(β) - Σj=1n yj2(β)] 2/π arctan{[Σj=1n xj2(β) Σj=1n yj2(β)]1/2/|Σj=1n xj(β)yj(β)| |Σj=1n xj2(0) - Σj=1n yj2(0)|/[Σj=1n xj2(0) + Σj=1n yj2(0)] - 1}.

Finally, multiply the above values of a and y with tr.

Then quadratic mean theories give

a(β) = [Σj=1n yj2(β) / Σj=1n xj2(β)]1/2 t(β)r(β) sign Σj=1n xj(β)yj(β) ,

y(β) = sign Σj=1n xj(β)yj(β) [Σj=1n yj2(β) / Σj=1n xj2(β)]1/2 t(β)r(β) x

for the transformed centralized data, whereas for the initial noncentralized data we obtain

y' = sign[Σj=1n (x'j - Σj=1n x'j / n)(y'j - Σj=1n y'j / n)] [Σj=1n (y'j - Σj=1n y'j / n)2/ Σj=1n (x'j - Σj=1n x'j / n)2]1/2 tr (x'j - Σj=1n x'j / n) + Σj=1n y'j / n .

Power mean theories naturally give by any power p > 0

a = (Σj=1n |yj|p / Σj=1n |xj|p)1/p tr sign Σj=1n xjyj ,

y = sign Σj=1n xjyjj=1n |yj|p / Σj=1n |xj|p)1/p tr x

for the transformed centralized data, whereas for the initial noncentralized data we obtain

y' = sign Σj=1n (x'j - Σj=1n x'j / n)(y'j - Σj=1n y'j / n) [Σj=1n |y'j - Σj=1n y'j / n|p/ Σj=1n |x'j - Σj=1n x'j / n|p]1/p tr (x'j - Σj=1n x'j / n) + Σj=1n y'j / n .

Unlike the LSM, PMT provide best linear approximation to the given data, e.g. in numeric tests, see Figures 1, 2 with replacing (x’, y’) via (x , y):

PwMean21.gif

Figure 1. 1|1S = 0.25, 1|1T = 0.75. 1|2S = 0.218, 1|2T = 0.782. 1|4S = 0.189, 1|4T = 0.811.

PwMean22.gif

Figure 2. 1|1S = 0.541, 1|1T = 0.459. 1|2S = 0.507, 1|2T = 0.493. 1|4S = 0.487, 1|4T = 0.513.

Nota bene: By linear approximation, the results of distance quadrat theories (DQT) and general theories of moments of inertia (GTMI) [4, 5] coincide. By Σj=1n |yj|p = Σj=1n |xj|p (and the best linear approximation y = ± x + C), the same also holds for PMT. Here y = x + 2 (Figures 1, 2). By Σj=1n |yj|p ≠ Σj=1n |xj|p , PMT give other results than DQT and GTMI. But PMT are valid by another invariance type than DQT and GTMI. The data symmetry straight line y = x + 2 is the best linear approximation in the both above tests. The LSM gives y = 0.909x + 2.364 (Figure 1) and even y = 0.591x + 3.636 (Figure 2) with the same data center (4, 6) and underestimating the modulus (absolute value) of the declination to the x-axis (which is typical) due to considering y-coordinate differences instead of distances with ignoring the declination of the approximation straight line to the x-axis.

Universalized quadratic and other power mean theories are very efficient in data estimation, approximation, and processing and reliable even by great data scatter.

Acknowledgements to Anatolij Gelimson for our constructive discussions on coordinate system transformation invariances and his very useful remarks.

References

[1] Encyclopaedia of Mathematics. Ed. M. Hazewinkel. Volumes 1 to 10. Kluwer Academic Publ., Dordrecht, 1988-1994

[2] Lev Gelimson. Providing Helicopter Fatigue Strength: Flight Conditions. In: Structural Integrity of Advanced Aircraft and Life Extension for Current Fleets – Lessons Learned in 50 Years After the Comet Accidents, Proceedings of the 23rd ICAF Symposium, Dalle Donne, C. (Ed.), 2005, Hamburg, Vol. II, 405-416

[3] Lev Gelimson. Overmathematics: Fundamental Principles, Theories, Methods, and Laws of Science. The ”Collegium” All World Academy of Sciences Publishers, Munich, 2010

[4] Lev Gelimson. Fundamental Science of Estimation. The ”Collegium” All World Academy of Sciences Publishers, Munich, 2010

[5] Lev Gelimson. Fundamental Science of Approximation. The ”Collegium” All World Academy of Sciences Publishers, Munich, 2010

[6] Lev Gelimson. Fundamental Science of Data Modeling. The ”Collegium” All World Academy of Sciences Publishers, Munich, 2010

[7] Lev Gelimson. Fundamental Science of Data Processing. The ”Collegium” All World Academy of Sciences Publishers, Munich, 2010

[8] Lev Gelimson. Corrections and Generalizations of the Least Square Method. In: Review of Aeronautical Fatigue Investigations in Germany during the Period May 2007 to April 2009, Ed. Dr. Claudio Dalle Donne, Pascal Vermeer, CTO/IW/MS-2009-076 Technical Report, International Committee on Aeronautical Fatigue, ICAF 2009, EADS Innovation Works Germany, 2009, 59-60