Skip to content

Add various averaging routines to csdb #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
delphidabbler opened this issue Jul 18, 2023 · 20 comments
Closed

Add various averaging routines to csdb #16

delphidabbler opened this issue Jul 18, 2023 · 20 comments
Assignees
Labels
duplicate This issue or pull request already exists

Comments

@delphidabbler
Copy link
Owner

Add Mean, Median and Mode averages in various overloads of Int and Float.

All should operate on dynamic arrays.

@delphidabbler delphidabbler self-assigned this Jul 18, 2023
@delphidabbler delphidabbler added enhancement New feature or request considering Issue is currently under consideration labels Jul 18, 2023
@github-project-automation github-project-automation bot moved this to Considering in Code Snippets Jul 18, 2023
@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 18, 2023

Median: middle value after sorting - if even number of elements then take mean of middle two => Median of integer array has to return float since mean of two integers could be fractional.

Mode: Result can be nothing if the is no value that has a greater number of instances then anything else; single value or multiple values if there is more than one value with same number of instances. Therefore Mode must return an array. Note that Mode can apply to any type, not just numbers.

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 18, 2023

UPDATE: This comment has been extracted into its own issue: #30


Note that Delphi's Math unit has a Mean function, but only for Float arrays:

function Mean(const Data: array of Single): Single;
function Mean(const Data: array of Double): Double;
function Mean(const Data: array of Extended): Extended;

So we could define Mean for integers that returns a Double. Two possible implementations:

function MeanInt(const A: array of Integer): Double;
begin
  Result := 0.0;
  for var I := Low(A) to High(A) do
    Result := Result + A[I] / Length(A);
end;

function MeanInt2(const A: array of Integer): Double;
begin
  var Sum: Int64 := 0;
  for var Elem in A do
    Inc(Sum, Elem);
  Result := Sum / Length(A);
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 18, 2023

UPDATE: This comment has been extracted into its own issue: #31


Also, there is geometric mean:

The geometric mean, sometimes also called geometric average, is an average calculated by multiplying a set of positive values and taking the nth root, where n is the number of values.

The geometric mean is used to minimize the effects of extreme values; for instance, when averaging growth rates.

--- Source: Eurostat

Geometric mean only works if all numbers are > 0.

Two formula (source Wikipedia):

  1. For N +ve numbers a1, a2, ... aN, GeoMean = Nth root of (a1 × a2 × ... aN).
  2. For N +ve numbers a1, a2, ... aN, GeoMean = exp(sum(ln(a1)), ln(a2), ... ĺn(aN)) / N).

Note the case 1 will overflow easily, so case 2 seems best algorithm.

E.g.:

function GeoMean(const A: array of Double): Double;
begin
  Assert(Length(A) > 0);
  var SumOfLogs: Double := 0.0;
  for var D in A do
  begin
    if Sign(D) <> PositiveValue then
      raise EArgumentOutOfRangeException.Create('Non positive value passed to GeoMean');
    SumOfLogs := SumOfLogs + Ln(A);
  end;
  Result := Exp(SumOfLogs / Length(A));
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 18, 2023

UPDATE: this comment has been extracted into its own issue: #32


And there's the logarithmic mean of two +ve numbers (not same as log-mean), which is defined as:

Where x = y:

Mlm(x, y) = x

Where x ≠ y:

Mlm(x, y) = (y - x) / (ln(y) - ln(x))

So:

function LogarithmicMean(const X, Y: Double): Double;
begin
  Assert((X > 0) and (Y > 0));
  if SameValue(X, Y) then
    Result := X
  else
    Result := (Y - X) / (Ln(Y) - Ln(X));
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 19, 2023

UPDATE: this comment has been extracted into its own issue: #33


Generalised Mean / Power Mean:

A power mean is a mean of the form

image

where all x are >= 0.

Source: Statistics How To & Wolfram MathWorld

function PowerMean(const A: array of Extended; Lambda: Extended): Extended;
begin
  Assert(not IsZero(Lambda));
  Assert(Length(A) > 0);
  var Sum: Extended := 0.0;
  for var X in A do
  begin
    Assert(Sign(X) <> NegativeValue);
    Sum := Sum + Power(X, Lambda);
  end;
  Result := Power(Sum / Length(A), 1 / Lambda);
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 19, 2023

UPDATE: this comment has been added to issue #33


Weighted Generalised Mean / Power Mean

Screenshot_20230719-204835_Chrome~2

When all weights w sum to 1 then denominator is 1 and can be ignored.

Soure: Statistics How To and Wikipedia

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 19, 2023

UPDATE: This comment has been included in issue #31


Weighted geometric mean

Screenshot_20230719-205701_Chrome

Source: Wikipedia

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 19, 2023

UPDATE: This comment has been included in issue: #30


Weighted arithmetic mean

Screenshot_20230719-210238_Chrome

Source: Wikipedia

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 20, 2023

UPDATE: this comment has been extracted into its own issue: #40


Useful helper functions for checking arrays for zero & negative entries:

// Check if all elements of a non-empty array are zero 
function ArrayIsZero(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if not IsZero(Elem) then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are <> 0 
function ArrayIsNonZero(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if IsZero(Elem) then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are > 0 
function ArrayIsPositive(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) <> PositiveValue then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are < 0 
function ArrayIsNegative(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) <> NegativeValue then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are <= 0 
function ArrayIsNonPositive(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) = PositiveValue then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are >= 0 
function ArrayIsNonNegative(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) = NegativeValue then
      Exit;
  Result := True;
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 20, 2023

UPDATE: This comment has been extracted into its own issue: #34


A related (helper) function is LogSumExp or LSE for a sequence x1,...,xn is is defined as the logarithm of the sum of the exponentials of the arguments:

LSE(x1,...,xn) = log(exp(x1) + ... + exp(xn))

Source: Wikipedia

function LSE(const A: array of Extended): Extended;
begin
  var Sum: Extended := 0.0;
  for var X in A do
    Sum := Sum + Exp(X):
  Result := Ln(Sum);
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 20, 2023

UPDATE: This comment has been included in issue #34


Strictly convex LogSumExp

Screenshot_20230720-040354_Chrome

Source: Wikipedia

Since exp(0) = e^0 = 1, we have

LSE(0, x1,...,xn) = log(exp(0) + exp(x1) + ... + exp(xn))
                  = log(1 + exp(x1) + ... + exp(xn)),

So, in Pascal we can have:

function LSEPlus0(const A: array of Extended): Extended;
begin
  var Sum: Extended := 1.0; // for additional Exp(0) term
  for var X in A do
    Sum := Sum + Exp(X):
  Result := Ln(Sum);
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 20, 2023

UPDATE: Decided not to implement this.


Quasi-arithmetic mean

Screenshot_20230720-042100_Chrome

Source: Wikipedia

Note that certain functions f may be restricted to certain intervals of the real numbers, so if would be wise to pass a closure to the Pascal function that ensures that the passed array has elements that fall within the interval, e.g.

function QAMean(
  const A: array of Extended; 
  F, FInv: TFunc<Extended,Extended>; 
  Constraint: TPredicate<Extended>
): Extended;

We could let Constraint be optional and default to nil in which case no checking would be done (I.e. any real number would be valid).

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 20, 2023

UPDATE: This comment has been extracted into its own issue: #35

Harmonic mean

Screenshot_20230720-042913_Chrome

Source: Wikipedia

@delphidabbler delphidabbler added accepted Issue will be actioned and removed considering Issue is currently under consideration labels Jul 21, 2023
@delphidabbler delphidabbler moved this from Considering to Accepted in Code Snippets Jul 21, 2023
@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 22, 2023

UPDATE: This comment has been extracted into its own issue: #41


Median

Two overloads suggested, one real & one integer:

function Median(const A: array of Extended): Extended; overload;
begin
  Assert(Length(A) > 0);
  // optimisations for array lengths 1 & 2 to avoid sorting
  if Length(A) = 1 then
    Exit(A[0]);
  if Length(A) = 2 then
    Exit((A[0] + A[1]) / 2.0);
  Generics.Collections.TArray.Sort<Extended>(A); // using standard comparer
  var MiddleLo := Length(A) div 2;
  if Odd(Length(A)) then
    Result := A[MiddleLo + 1]
  else
    Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2.0;
end;

function Median(const A: array of Integer): Extended; overload;
begin
  Assert(Length(A) > 0);
  // optimisations for array lengths 1 & 2 to avoid sorting
  if Length(A) = 1 then
    Exit(A[0]);
  if Length(A) = 2 then
    Exit((A[0] + A[1]) / 2);
  Generics.Collections.TArray.Sort<Integer>(A); // using standard comparer
  var MiddleLo := Length(A) div 2;
  if Odd(Length(A)) then
    Result := A[MiddleLo + 1]
  else
    Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2;
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 22, 2023

UPDATE: This comment has been extracted into its own issue: #42


Mode for countable sets: ModeN

Obeys rule that if all items in a sequence of values are unique then there is no mode. The following function returns an empty array in that case. If sequence in mono-modal an array of one element containing the mode is returned, while if sequence is multi-modal, an array containing all the modes is returned.

This function illustrates the case for a sequence of Integer values. Overloads for other ordinal types are obvious.

function ModeN(const A: array of Integer): TArray<Integer>;
begin
  if Length(A) = 0 then
    raise EArgumentException.Create('Array is empty');
  var MaxCount: Cardinal := 0;
  var Counts := TDictionary<Integer,Cardinal>.Create;
  try
    for var X in A do
    begin
      var C: Cardinal;
      if not Counts.TryGetValue(X, C) then
        C := 0;
      Inc(C);
      Counts.AddOrSetValue(X, C);
      if MaxCount < C then
        MaxCount := C;
    end;
    // equivalent test?: if MaxCount = 1 then
    if Length(A) = Counts.Count then 
    begin
      // All numbers are unique => no mode
      SetLength(Result, 0);
      Exit;
    end;
    var Results := TList<Integer>.Create;
    try
      for KV in Counts do
      begin
        if KV.Value = MaxCount then
          Results.Add(KV.Key);
      end;
      Result := Results.ToArray;
    finally
      Results.Free;
    end;
  finally
    Counts.Free;
  end;
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 22, 2023

UPDATE: Decided not to implement this

I think the text and code below the ruling is probably wrong

From what I've read recently, real numbers should be partitioned into equal size blocks, as in a histogram. Real numbers falling within the same partition should be considered equal for counting purposes. The following code considers any number within ε of another number to be counted as the same. But, if the numbers are partitioned, it is possible two numbers within ε of each other could fall into different partitions. This means that the following code is too simplistic.

Consequently, we need to use ε as the partition length and treat numbers that fall within ε of the nearest partition lower bound to be in the partition.

Epsilon would be better renamed as something like PartitionLength.

Partitions should be symmetrical about zero.


Mode for real numbers. Numbers are considered equal if they are within Epsilon of each other. If Epsilon is zero then Delphi chooses a suitable default.

function ModeR(const A: array of Extended; const Epsilon: Extended = 0.0): TArray<Extended>;
begin
  Assert(Length(A) > 0);
  var MaxCount: Cardinal := 0;
  var Counts := TDictionary<Extended,Cardinal>.Create(
    TDelegatedEqualityComparer.Create(
      function (const L, R: Extended): Boolean
      begin
        Result := Math.SameValue(L, R, Epsilon);
      end,
      function (const V: Extended): Integer
      begin
        Result := TDelegatedEqualityComparer.Default.GetHashCode(V);
      end
    );
  );
  try
    for var X in A do
    begin
      var C: Cardinal;
      if not Counts.TryGetValue(X, C) then
        C := 0;
      Inc(C);
      Counts.AddOrSetValue(X, C);
      if MaxCount < C then
        MaxCount := C;
    end;
    // equivalent test?: if MaxCount = 1 then
    if Length(A) = Counts.Count then 
    begin
      // All numbers are unique => no mode
      SetLength(Result, 0);
      Exit;
    end;
    var Results := TList<Extended>.Create;
    try
      for KV in Counts do
      begin
        if KV.Value = MaxCount then
          Results.Add(KV.Key);
      end;
      Result := Results.ToArray;
    finally
      Results.Free;
    end;
  finally
    Counts.Free;
  end;
end;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 22, 2023

UPDATE: Decided not to implement this


Mode with probability function p(x) that gives weighted values for, say, integer x.

We define probability function as a closure:

P: TFunc<Integer,Extended>;

@delphidabbler
Copy link
Owner Author

delphidabbler commented Jul 22, 2023

UPDATE: Decided not to implement this.


It may be useful to define a Mode function that returns the index/indices of one of the array elements that contain(s) the mode(s).

In the following we will use an array of Int64 and return the indices in an array ov Integerm just to make it easy to distinguish between the array type and the index type.

*** ⚠️ check function below TODO: reverse order of purpose of key/value in TPair used to store counter & index below - seems more logical if index, which is invariant, is labelled as Key & counter, which changes, is Value.

function ModeNByIndex(const A: array of Int64): TArray<Integer>;
begin
  Assert(Length(A) > 0);
  var MaxCount: Cardinal := 0;
  var Counts := TDictionary<Int64,TPair<{Counter}Cardinal,{Index}Integer>>.Create;
  try
    for var Idx: Integer := Low(A) to High(A) do
    begin
      var V: TPair<Cardinal,Integer>;
      if not Counts.TryGetValue(A[Idx], V) then
        V := TPair<Cardinal,Integer>.Create(0, Idx);
      Inc(V.Key);
      Counts.AddOrSetValue(A[Idx], V);
      if MaxCount < V.Key then
        MaxCount := V.Key;
    end;
    if Length(A) = Counts.Count then 
    begin
      // All numbers are unique => no mode
      SetLength(Result, 0);
      Exit;
    end;
    var Results := TList<Integer>.Create;
    try
      for KV in Counts do
        if KV.Value.Key = MaxCount then
          Results.Add(KV.Value.Value);
      Result := Results.ToArray;
    finally
      Results.Free;
    end;
  finally
    Counts.Free;
  end;
end;

@delphidabbler
Copy link
Owner Author

Need to split these suggested functions into several issues. There are too many to implement in one go.

@delphidabbler
Copy link
Owner Author

Need to split these suggested functions into several issues. There are too many to implement in one go.

All comments with coding suggestions have either been rejected or extracted into new issues.

This issue is now considered to be a duplicate and is being closed.

@delphidabbler delphidabbler added duplicate This issue or pull request already exists and removed enhancement New feature or request accepted Issue will be actioned labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

1 participant