-
Notifications
You must be signed in to change notification settings - Fork 2
Add various averaging routines to csdb #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Median: middle value after sorting - if even number of elements then take mean of middle two => Median of integer array has to return float since mean of two integers could be fractional. Mode: Result can be nothing if the is no value that has a greater number of instances then anything else; single value or multiple values if there is more than one value with same number of instances. Therefore Mode must return an array. Note that Mode can apply to any type, not just numbers. |
UPDATE: This comment has been extracted into its own issue: #30Note that Delphi's function Mean(const Data: array of Single): Single;
function Mean(const Data: array of Double): Double;
function Mean(const Data: array of Extended): Extended; So we could define Mean for integers that returns a function MeanInt(const A: array of Integer): Double;
begin
Result := 0.0;
for var I := Low(A) to High(A) do
Result := Result + A[I] / Length(A);
end;
function MeanInt2(const A: array of Integer): Double;
begin
var Sum: Int64 := 0;
for var Elem in A do
Inc(Sum, Elem);
Result := Sum / Length(A);
end; |
UPDATE: This comment has been extracted into its own issue: #31Also, there is geometric mean:
Geometric mean only works if all numbers are > 0. Two formula (source Wikipedia):
Note the case 1 will overflow easily, so case 2 seems best algorithm. E.g.: function GeoMean(const A: array of Double): Double;
begin
Assert(Length(A) > 0);
var SumOfLogs: Double := 0.0;
for var D in A do
begin
if Sign(D) <> PositiveValue then
raise EArgumentOutOfRangeException.Create('Non positive value passed to GeoMean');
SumOfLogs := SumOfLogs + Ln(A);
end;
Result := Exp(SumOfLogs / Length(A));
end; |
UPDATE: this comment has been extracted into its own issue: #32And there's the logarithmic mean of two +ve numbers (not same as log-mean), which is defined as: Where x = y:
Where x ≠ y:
So: function LogarithmicMean(const X, Y: Double): Double;
begin
Assert((X > 0) and (Y > 0));
if SameValue(X, Y) then
Result := X
else
Result := (Y - X) / (Ln(Y) - Ln(X));
end; |
UPDATE: this comment has been extracted into its own issue: #33Generalised Mean / Power Mean: A power mean is a mean of the form where all x are >= 0. Source: Statistics How To & Wolfram MathWorld function PowerMean(const A: array of Extended; Lambda: Extended): Extended;
begin
Assert(not IsZero(Lambda));
Assert(Length(A) > 0);
var Sum: Extended := 0.0;
for var X in A do
begin
Assert(Sign(X) <> NegativeValue);
Sum := Sum + Power(X, Lambda);
end;
Result := Power(Sum / Length(A), 1 / Lambda);
end; |
UPDATE: this comment has been added to issue #33Weighted Generalised Mean / Power Mean When all weights w sum to 1 then denominator is 1 and can be ignored. Soure: Statistics How To and Wikipedia |
UPDATE: this comment has been extracted into its own issue: #40Useful helper functions for checking arrays for zero & negative entries: // Check if all elements of a non-empty array are zero
function ArrayIsZero(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if not IsZero(Elem) then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are <> 0
function ArrayIsNonZero(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if IsZero(Elem) then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are > 0
function ArrayIsPositive(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) <> PositiveValue then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are < 0
function ArrayIsNegative(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) <> NegativeValue then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are <= 0
function ArrayIsNonPositive(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) = PositiveValue then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are >= 0
function ArrayIsNonNegative(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) = NegativeValue then
Exit;
Result := True;
end; |
UPDATE: This comment has been extracted into its own issue: #34A related (helper) function is LogSumExp or LSE for a sequence x1,...,xn is is defined as the logarithm of the sum of the exponentials of the arguments:
Source: Wikipedia function LSE(const A: array of Extended): Extended;
begin
var Sum: Extended := 0.0;
for var X in A do
Sum := Sum + Exp(X):
Result := Ln(Sum);
end; |
UPDATE: This comment has been included in issue #34Strictly convex LogSumExp Source: Wikipedia Since exp(0) = e^0 = 1, we have
So, in Pascal we can have: function LSEPlus0(const A: array of Extended): Extended;
begin
var Sum: Extended := 1.0; // for additional Exp(0) term
for var X in A do
Sum := Sum + Exp(X):
Result := Ln(Sum);
end; |
UPDATE: Decided not to implement this.Quasi-arithmetic mean Source: Wikipedia Note that certain functions f may be restricted to certain intervals of the real numbers, so if would be wise to pass a closure to the Pascal function that ensures that the passed array has elements that fall within the interval, e.g. function QAMean(
const A: array of Extended;
F, FInv: TFunc<Extended,Extended>;
Constraint: TPredicate<Extended>
): Extended; We could let |
UPDATE: This comment has been extracted into its own issue: #41Median Two overloads suggested, one real & one integer: function Median(const A: array of Extended): Extended; overload;
begin
Assert(Length(A) > 0);
// optimisations for array lengths 1 & 2 to avoid sorting
if Length(A) = 1 then
Exit(A[0]);
if Length(A) = 2 then
Exit((A[0] + A[1]) / 2.0);
Generics.Collections.TArray.Sort<Extended>(A); // using standard comparer
var MiddleLo := Length(A) div 2;
if Odd(Length(A)) then
Result := A[MiddleLo + 1]
else
Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2.0;
end;
function Median(const A: array of Integer): Extended; overload;
begin
Assert(Length(A) > 0);
// optimisations for array lengths 1 & 2 to avoid sorting
if Length(A) = 1 then
Exit(A[0]);
if Length(A) = 2 then
Exit((A[0] + A[1]) / 2);
Generics.Collections.TArray.Sort<Integer>(A); // using standard comparer
var MiddleLo := Length(A) div 2;
if Odd(Length(A)) then
Result := A[MiddleLo + 1]
else
Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2;
end; |
UPDATE: This comment has been extracted into its own issue: #42Mode for countable sets: Obeys rule that if all items in a sequence of values are unique then there is no mode. The following function returns an empty array in that case. If sequence in mono-modal an array of one element containing the mode is returned, while if sequence is multi-modal, an array containing all the modes is returned. This function illustrates the case for a sequence of function ModeN(const A: array of Integer): TArray<Integer>;
begin
if Length(A) = 0 then
raise EArgumentException.Create('Array is empty');
var MaxCount: Cardinal := 0;
var Counts := TDictionary<Integer,Cardinal>.Create;
try
for var X in A do
begin
var C: Cardinal;
if not Counts.TryGetValue(X, C) then
C := 0;
Inc(C);
Counts.AddOrSetValue(X, C);
if MaxCount < C then
MaxCount := C;
end;
// equivalent test?: if MaxCount = 1 then
if Length(A) = Counts.Count then
begin
// All numbers are unique => no mode
SetLength(Result, 0);
Exit;
end;
var Results := TList<Integer>.Create;
try
for KV in Counts do
begin
if KV.Value = MaxCount then
Results.Add(KV.Key);
end;
Result := Results.ToArray;
finally
Results.Free;
end;
finally
Counts.Free;
end;
end; |
UPDATE: Decided not to implement thisI think the text and code below the ruling is probably wrong From what I've read recently, real numbers should be partitioned into equal size blocks, as in a histogram. Real numbers falling within the same partition should be considered equal for counting purposes. The following code considers any number within ε of another number to be counted as the same. But, if the numbers are partitioned, it is possible two numbers within ε of each other could fall into different partitions. This means that the following code is too simplistic. Consequently, we need to use ε as the partition length and treat numbers that fall within ε of the nearest partition lower bound to be in the partition.
Partitions should be symmetrical about zero. Mode for real numbers. Numbers are considered equal if they are within function ModeR(const A: array of Extended; const Epsilon: Extended = 0.0): TArray<Extended>;
begin
Assert(Length(A) > 0);
var MaxCount: Cardinal := 0;
var Counts := TDictionary<Extended,Cardinal>.Create(
TDelegatedEqualityComparer.Create(
function (const L, R: Extended): Boolean
begin
Result := Math.SameValue(L, R, Epsilon);
end,
function (const V: Extended): Integer
begin
Result := TDelegatedEqualityComparer.Default.GetHashCode(V);
end
);
);
try
for var X in A do
begin
var C: Cardinal;
if not Counts.TryGetValue(X, C) then
C := 0;
Inc(C);
Counts.AddOrSetValue(X, C);
if MaxCount < C then
MaxCount := C;
end;
// equivalent test?: if MaxCount = 1 then
if Length(A) = Counts.Count then
begin
// All numbers are unique => no mode
SetLength(Result, 0);
Exit;
end;
var Results := TList<Extended>.Create;
try
for KV in Counts do
begin
if KV.Value = MaxCount then
Results.Add(KV.Key);
end;
Result := Results.ToArray;
finally
Results.Free;
end;
finally
Counts.Free;
end;
end; |
UPDATE: Decided not to implement thisMode with probability function p(x) that gives weighted values for, say, integer x. We define probability function as a closure: P: TFunc<Integer,Extended>; |
UPDATE: Decided not to implement this.It may be useful to define a Mode function that returns the index/indices of one of the array elements that contain(s) the mode(s). In the following we will use an array of Int64 and return the indices in an array ov Integerm just to make it easy to distinguish between the array type and the index type. *** function ModeNByIndex(const A: array of Int64): TArray<Integer>;
begin
Assert(Length(A) > 0);
var MaxCount: Cardinal := 0;
var Counts := TDictionary<Int64,TPair<{Counter}Cardinal,{Index}Integer>>.Create;
try
for var Idx: Integer := Low(A) to High(A) do
begin
var V: TPair<Cardinal,Integer>;
if not Counts.TryGetValue(A[Idx], V) then
V := TPair<Cardinal,Integer>.Create(0, Idx);
Inc(V.Key);
Counts.AddOrSetValue(A[Idx], V);
if MaxCount < V.Key then
MaxCount := V.Key;
end;
if Length(A) = Counts.Count then
begin
// All numbers are unique => no mode
SetLength(Result, 0);
Exit;
end;
var Results := TList<Integer>.Create;
try
for KV in Counts do
if KV.Value.Key = MaxCount then
Results.Add(KV.Value.Value);
Result := Results.ToArray;
finally
Results.Free;
end;
finally
Counts.Free;
end;
end; |
Need to split these suggested functions into several issues. There are too many to implement in one go. |
All comments with coding suggestions have either been rejected or extracted into new issues. This issue is now considered to be a duplicate and is being closed. |
Add Mean, Median and Mode averages in various overloads of Int and Float.
All should operate on dynamic arrays.
The text was updated successfully, but these errors were encountered: