Bounds Mean Theories in Fundamental Sciences of Estimation, Approximation, Data Modeling and Processing
by
© Ph. D. & Dr. Sc. Lev Gelimson
Academic Institute for Creating Fundamental Sciences (Munich, Germany)
Mathematical Journal
of the "Collegium" All World Academy of Sciences
Munich (Germany)
11 (2011), 4
By data modeling, processing, estimation, and approximation [1], data scatter is relatively great in many cases and often allows no discriminating different analytic approximation expression types or forms, e.g. linear, piecewise linear, parabolic, hyperbolic, circumferential, elliptic, sinusoidal, etc. by two-dimensional data or linear, piecewise linear, paraboloidal, hyperboloidal, spherical, ellipsoidal, etc. by three-dimensional data. In such a situation, pure analytic approach alone is blind and often leads to false results. Without graphically interpreting the given data, it is almost impossible to discover relations between them and their laws to provide adequate data processing. For reasonably analytically approximating the given data, it is necessary and very useful to create conditions for efficiently applying analytic methods.
As ever, the fundamental principle of tolerable simplicity [2-7] plays a key role.
In overmathematics [2-7] and fundamental sciences of estimation [8-13], approximation [14, 15], as well as data modeling [16] and processing [17], to clearly graphically interpret the given three-dimensional data, it is very useful to provide their two-dimensional modeling via suitable data transformation if possible. For example, this is the case by strength data due to fundamental science of strength data unification, modeling, analysis, processing, approximation, and estimation [18, 19].
Bounds mean theories in fundamental sciences of estimation, approximation, data modeling and processing are applicable to practically arbitrary given data for which there exists a certain probable approximation law. In particular, it is also possible to combine these theories with other theories transforming the given data and to use not only their end data but also their intermediate data. In particular, preliminarily apply graph-analytic theories [20], principal graph types theories [21], invariance theories [22], and groupwise centralization theories [23] in fundamental sciences of estimation, approximation, data modeling and processing to the given data.
Bounds mean theories consider the both extreme levels only of the given data points and ignore all the intermediate levels of the given data points and hence all the remaining data points. All these levels are broken straight lines as graphs of piecewise linear functions.
The ideas and essence of bounds mean theories are as follows:
1. Determining the least closed area containing all the data points.
2. Very roughly purely graphically determining a certain probable approximation law with the corresponding probable approximation line (straight line or curve).
3. Dividing the area boundary graph into two subgraphs, namely above and below subgraphs (if this law graph is nonclosed) or outer and inner subgraphs (if this law graph is closed), with building the both extreme levels of the given data points.
4. Possibly continualizing either these both extreme levels or this whole area, i.e. replacing the given data points set either with this area boundary graph or with this whole area, the both of them being considered continual.
5. Determining a graph which can be considered a mean (middle) graph between the both area boundary subgraphs in a certain reasonable sense with taking valid types of data invariance into account. By rotation invariance, use general theories of moments of inertia [24] in fundamental sciences of estimation, approximation, data modeling and processing.
6. Directly considering this mean (middle) graph to be namely a graph of this desired approximation law.
The bounds mean theories type includes its subtypes possibly with reasonably variating its ideas.
Piecewise linear bounds mean theories directly consider the area boundary graph which is usually namely piecewise linear.
Piecewise linear bounds weighted mean theories directly consider the typically piecewise linear area boundary graph and use weighted mean, namely the total numbers of the given data points on the both area boundary subgraphs separately as the weights of these subgraphs for determining a weighted mean (middle) graph considered to be namely a graph of this desired approximation law.
Piecewise linear bounds locally weighted mean theories directly consider the typically piecewise linear area boundary graph and use locally weighted mean, namely dividing (along this probable approximation line) the whole data points area into its local parts with taking valid types of data invariance into account, determining the local numbers of the given data points in the corresponding local parts of the both area boundary subgraphs separately as the weights of these subgraphs parts for separately determining the corresponding local parts of a weighted mean (middle) graph considered to be namely a graph of this desired approximation law.
Curvilinear bounds mean theories intermediately provide an appropriate curvilinear approximation to the area boundary graph.
Curvilinear bounds weighted mean theories intermediately provide an appropriate curvilinear approximation to the area boundary graph and use weighted mean, namely the total numbers of the given data points on the both area boundary subgraphs separately as the weights of the curvilinear approximations to these subgraphs for determining a weighted mean (middle) graph considered to be namely a graph of this desired approximation law.
Curvilinear bounds locally weighted mean theories intermediately provide an appropriate curvilinear approximation to the area boundary graph and use locally weighted mean, namely dividing (along this probable approximation line) the whole data points area into its local parts with taking valid types of data invariance into account, determining the local numbers of the given data points in the corresponding local parts of the both area boundary subgraphs separately as the weights of the curvilinear approximations to these subgraphs parts for separately determining the corresponding local parts of a weighted mean (middle) graph considered to be namely a graph of this desired approximation law.
Bounds mean theories consider the boundary data points only among all the given data and provide simply approximating all the given data.
To improve data modeling, processing, estimation, and approximation, it is also possible to preliminarily locally represent each data point group with its center whose quantity equals the number of the points in this group and then applying both graphical and analytic approaches to the already groupwise centralized data, namely to a quantiset [2-7] of their local groupwise centers.
Consider a group of data points quantielements, generally their quantiset. Denote the ith coordinate with xi (i = 1, 2, ... , m , m ∈ N+ = {1, 2, ...}, e.g. by m = 2, x1 for x-coordinate and x2 for y-coordinate), the ith coordinate of the jth data point quantielement [2-7] (j = 1, 2, ... , n , n ∈ N+ = {1, 2, ...}) of a group of data points quantielements with xij , and the quantity of the element in this quantielement with qj . Replace this group with its weighted central data point quantielement. The ith coordinate of the element of this quantielement is
xi = Σj=1n qjxjj / Σj=1n qj ,
and the quantity of the element in this quantielement is
q = Σj=1n qj .
The variety of bounds mean theories and their variability provide their algorithms flexibility.
These algorithms have some typical steps (stages) in common first shown in the simplest two-dimensional case:
1. Graphically represent the initial or suitably transformed given data in a two-dimensional Cartesian coordinate system.
2. Determine and separate clear outlier candidates (whose number should not exceed entier(n/10) + 1, i.e. about 10 % of the total number n of the data points) via applying intuitive graphical and then (if necessary and useful) analytic criteria to the (initial or suitably transformed) given data. For example, a data point could be considered an outlier candidate if the radius of the greatest circular neighborhood of this point without the remaining given data points is at least three times greater than the mean arithmetic value of such radii of the entier(9n/10), i.e. about 90 % of the given data points with the least radii. Another possible criterion: For every data point, determine the sum of its distances from entier(n1/2) other data points nearest to this data point, compare these sums for all the data points, and separate at most entier(n/10) + 1 clear outlier candidates with the greatest sums, ideally all data points for which such sums are much greater than such sums for all the remaining (so-called clearly correct) data points.
3. Determine the least closed area containing all the clearly correct data points.
4. Consider all the outlier candidates, determine and separate clear outliers. Nota bene: An outlier candidate is no outlier if and only if its distance from the least closed area (see further) containing all the clearly correct data points either vanishes (when this candidate is an insider of this area) or is sufficiently small, e.g. not greater than the mean distance of the clearly correct data point with the greatest sum from entier (n1/2) other data points nearest to this data point.
5. Add all the outlier candidates which are no outliers to all the clearly correct data points and obtain all the already correct data points.
6. Determine the least closed area containing all the already correct data points.
7. Consider all the outliers, build such their appropriate combinations that the elements of the weighted central data point quantielements belong to this area, and select the best set of such combinations which provides most completely using all the outliers without repetitions. In such a way, usually it is possible to use some part of the outliers. Then add their weighted central data point quantielements to the already correct data points and obtain the so-called correct data points.
8. Using the fundamental principle of tolerable simplicity [2-7], intuitively graphically determine and represent a probable approximation line (straight line or curve) best fitting all the correct data points. Nota bene: This line can be now determined and represented very approximately, even roughly only and should be considered fuzzy, namely as one of the possible lines in a band with a certain width which can be variable along this line.
9. Intuitively graphically divide all the correct data points into three parts:
9.1. So-called positive data points lying clearly above the probable approximation line if it is not closed or outside the probable approximation line if it is closed.
9.2. So-called negative data points lying clearly below the probable approximation line if it is not closed or inside the probable approximation line if it is closed.
9.3. So-called already suitable data points for which it is not clear whether they lie either on the probable approximation line, or above or below this line if it is not closed, or outside or inside the probable approximation line if it is closed.
10. Separate all the already suitable data points and leave them without any further transformations.
11. Consider all the both positive and negative data points.
12. On the probable approximation line, select some initial point (beginning) and end point. It coincides with the initial point if this line is closed. In this case only, also select a direction of moving from the initial point (beginning) to the end point along this line. Otherwise, this direction is unique.
13. For each either positive or negative data point, define and determine the set of all the points of the probable approximation line which can be considered the projections of this data point onto this line with taking data invariance [22] into account.
13.1. In the case of data rotation invariance, for each either positive or negative data point, define and determine the set of all the points (generalized normal projections of this data point onto this line) of the probable approximation line which are the nearest to this data point.
13.1.1. If this set contains one element only, then consider the corresponding line point as the projection of this data point on this line. Further consider this data point and its projection on this line mutual one-to-one corresponding to one another.
13.1.2. If this set consists of separate line points, then determine this set uniquantity [2-7] q which is equal to the number of the elements of this set if it is finite, further divide 1 (the data point quantity) by this uniquantity q and consider this data point as the quantiset [2-7] of q quantielements, each of them coinciding with this data point and has quantity 1/q . Then for each projection point on this line, select precisely one data point quantielement and consider it and this projection mutual one-to-one corresponding to one another.
13.1.3. If this set is continual or, generally, consists of both separate line points and continual line parts, then determine this set uniquantity q , further divide 1 (the data point quantity) by this uniquantity q and consider this data point as the quantiset of q quantielements, each of them coinciding with this data point and has quantity 1/q . Then for each projection point on this line, select precisely one data point quantielement and consider it and this projection mutual one-to-one corresponding to one another. By each of these continual line parts, simplification is also possible. Namely, consider the middle point of its length as the unique projection point in this part. Denote the uniquantity of this part with p . Now consider this projection point and precisely one data point quantielement with quantity p/q mutual one-to-one corresponding to one another. Nota bene: Continuality takes place in the case of arcs of circles whose center coincides with this data point.
13.2. In the case of data tension-compression invariance, for each either positive or negative data point, define and determine the set of all the points (generalized parallel projections of this data point onto this line) of the probable approximation line which have the same values (as this data point) of the coordinate (usually x-coordinate, or abscissa) which can be considered an independent variable (argument) of this line equation.
13.2.1. If this set contains one element only, then consider the corresponding line point as the projection of this data point on this line. Further consider this data point and its projection on this line mutual one-to-one corresponding to one another.
13.2.2. Otherwise (if this set uniquantity [2-7] is greater than 1), among its elements, consider the set of the line points which are the nearest to this data point.
13.2.3. If this set clearly contains one element only, then consider the corresponding line point as the projection of this data point on this line. Further consider this data point and its projection on this line mutual one-to-one corresponding to one another.
13.2.4. Otherwise (if this set is fuzzy, namely there are two line points for which it is unclear which of them is nearer to this data point than the remaining line point), consider the set of these both line points which both are the nearest to this data point. The set uniquantity q = 2. Divide 1 (the data point quantity) by this uniquantity q = 2 and consider this data point as the quantiset [2-7] of q = 2 quantielements, each of them coinciding with this data point and has quantity 1/q = 1/2 . Then for each projection point on this line, select precisely one data point quantielement and consider it and this projection mutual one-to-one corresponding to one another.
14. Now move along the probable approximation line from its initial point (beginning) to its end point. If this line is closed, then take the already selected direction. Search for all the either positive or negative data points projections in the order of this moving and note both the "signs" and the quantities of the corresponding data points quantielements, as well as the places of changing these "signs". If for certain data point quantielements, their projection points on this line are so near to one another that the order of these projection points on this line is unclear, then gather all these data point quantielements in one group and replace this group with its weighted central data point quantielement.
15. Consider the first projection and note both the "sign" (as the first "sign") and the quantity of the corresponding data points quantielement, move further up to the first place of changing these "signs", and gather all the corresponding data points quantielements of the same first "sign" into the subgroup (of the same first "sign") of the first group of data points quantielements of the same first "sign". Then move from the first to the second place of changing these "signs" and gather all the corresponding data points quantielements of the same second "sign" (opposite to the first "sign") into the subgroup (of the same second "sign") of the same first group of data points quantielements of the same second "sign". Now the first group of data points quantielements both of the first and second "sign" is complete.
16. Replace this group with its weighted central data point quantielement.
17. Now move from the second to the third place of changing these "signs" and gather all the corresponding data points quantielements of the same first "sign" into the subgroup (of the same first "sign") of the second group of data points quantielements of the same first "sign". Then move from the third to the forth place of changing these "signs" and gather all the corresponding data points quantielements of the same second "sign" (opposite to the first "sign") into the subgroup (of the same second "sign") of the same second group of data points quantielements of the same second "sign". Now the second group of data points quantielements both of the first and second "sign" is complete. Now replace this group with its weighted central data point quantielement via substituting all the coordinates and the quantities of the elements in the quantielements of this second group into the same general formulae.
18. Continue this process up to the end point of the probable approximation line, form all the further groups of data points quantielements both of the first and second "sign", and replace all these groups with their weighted central data point quantielements via substituting all the coordinates and the quantities of the elements in the quantielements of these groups into the same general formulae.
19. Now consider all the already suitable data points and all these weighted central data point quantielements together and determine the least closed area containing all these so-called suitable data points.
20. Apply graph-analytic theories [20] and principal graph types theories [21] in fundamental sciences of estimation, approximation, data modeling and processing to the suitable data points whose scatter is usually much less than the initial data scatter. This provides clearly determining most suitable analytic approximation expression types or forms, e.g. linear, piecewise linear, parabolic, hyperbolic, circumferential, elliptic, sinusoidal, etc. by two-dimensional data or linear, piecewise linear, paraboloidal, hyperboloidal, spherical, ellipsoidal, etc. by three-dimensional data.
21. Apply bounds mean theories of different types to the given data points, already correct data points, correct data points, already suitable data points, as well as suitable data points, and compare the results with one another.
These theories are very efficient in estimation, approximation, and data processing.
Acknowledgements to Anatolij Gelimson for our constructive discussions on coordinate system transformation invariances and his very useful remarks.
References
[1] Encyclopaedia of Mathematics. Ed. M. Hazewinkel. Volumes 1 to 10. Kluwer Academic Publ., Dordrecht, 1988-1994
[2] Lev Gelimson. Basic New Mathematics. Monograph. Drukar Publishers, Sumy, 1995
[3] Lev Gelimson. General Analytic Methods. Abhandlungen der WIGB (Wissenschaftlichen Gesellschaft zu Berlin), 3 (2003), Berlin
[4] Lev Gelimson. Elastic Mathematics. Abhandlungen der WIGB (Wissenschaftlichen Gesellschaft zu Berlin), 3 (2003), Berlin
[5] Lev Gelimson. Elastic Mathematics. General Strength Theory. Mathematical, Mechanical, Strength, Physical, and Engineering Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2004
[6] Lev Gelimson. Providing Helicopter Fatigue Strength: Flight Conditions. In: Structural Integrity of Advanced Aircraft and Life Extension for Current Fleets – Lessons Learned in 50 Years After the Comet Accidents, Proceedings of the 23rd ICAF Symposium, Dalle Donne, C. (Ed.), 2005, Hamburg, Vol. II, 405-416
[7] Lev Gelimson. Overmathematics: Fundamental Principles, Theories, Methods, and Laws of Science. Mathematical Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2009
[8] Lev Gelimson. General estimation theory. Transactions of the Ukraine Glass Institute, 1 (1994), 214-221
[9] Lev Gelimson. General Estimation Theory. Mathematical Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2001
[10] Lev Gelimson. General Estimation Theory Fundamentals. Mathematical Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 1 (2001), 3
[11] Lev Gelimson. General Estimation Theory Fundamentals (along with its line by line translation into Japanese). Mathematical Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 9 (2009), 1
[12] Lev Gelimson. General Estimation Theory (along with its line by line translation into Japanese). Mathematical Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2011
[13] Lev Gelimson. Fundamental Science of Estimation. Mathematical Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2011
[14] Lev Gelimson. General Problem Theory. Abhandlungen der WIGB (Wissenschaftlichen Gesellschaft zu Berlin), 3 (2003), Berlin
[15] Lev Gelimson. Fundamental Science of Approximation. Mathematical Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2011
[16] Lev Gelimson. Fundamental Science of Data Modeling. Mathematical Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2011
[17] Lev Gelimson. Fundamental Science of Data Processing. Mathematical Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2011
[18] Lev Gelimson. Fundamental Science of Strength Data Unification, Modeling, Analysis, Processing, Approximation, and Estimation (Essential). Strength and Engineering Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 10 (2010), 3
[19] Lev Gelimson. Fundamental Science of Strength Data Unification, Modeling, Analysis, Processing, Approximation, and Estimation (Fundamentals). Strength Monograph. The “Collegium” All World Academy of Sciences Publishers, Munich (Germany), 2010
[20] Lev Gelimson. Graph-Analytic Theories in Fundamental Sciences of Estimation, Approximation, Data Modeling and Processing (Essential). Mathematical Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 11 (2011), 2
[21] Lev Gelimson. Principal Graph Types Theories in Fundamental Sciences of Estimation, Approximation, Data Modeling and Processing (Essential). Mathematical Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 11 (2011), 3
[22] Lev Gelimson. Data, Problem, Method, and Result Invariance Theories in Fundamental Sciences of Estimation, Approximation, Data Modeling and Processing, and Solving General Problems (Essential). Mathematical Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 11 (2011), 1
[23] Lev Gelimson. Groupwise Centralization Theories in Fundamental Sciences of Estimation, Approximation, Data Modeling and Processing (Essential). Mechanical and Physical Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 11 (2011), 1
[24] Lev Gelimson. General Theories of Moments of Inertia in Fundamental Sciences of Estimation, Approximation, Data Modeling and Processing (Essential). Mechanical and Physical Journal of the “Collegium” All World Academy of Sciences, Munich (Germany), 10 (2010), 3