[FLINK-30571] Estimate scalability coefficient from past scaling history using linear regression #966

pchoudhury22 · 2025-04-02T12:17:52Z

What is the purpose of the change

Currently, target parallelism computation assumes perfect linear scaling. However, real-time workloads often exhibit nonlinear scalability due to factors like network overhead and coordination costs.

This change introduces an observed scalability coefficient, estimated using weighted linear regression on past (parallelism, processing rate) data, to improve the accuracy of scaling decisions.

Brief change log

Implemented a dynamic scaling coefficient to compute target parallelism based on observed scalability. The system estimates the scalability coefficient using a weighted least squares linear regression approach, leveraging historical (parallelism, processing rate) data.
The regression model minimizes the weighted sum of squared errors, where weights are assigned based on both parallelism and recency to prioritize recent observations. The baseline processing rate is computed using the smallest observed parallelism in the history. Model details:

The Linear Model

We define a linear relationship between parallelism (P) and processing rate (R):

$$R_i = β * P_i * α$$

where:

R_i = actual processing rate for the i-th data point
P_i = parallelism for the i-th data point
β = base factor (constant scale factor)
α = scaling coefficient to optimize

Weighted Squared Error

The loss function to minimize is the weighted sum of squared errors (SSE):

$$Loss = Σ w_i * (R_i - R̂_i)^2$$

Substituting ( R̂_i = (β α) P_i ):

$$Loss = Σ w_i * (R_i - β α P_i)^2$$

where w_i is the weight for each data point.

Minimizing the Error

Expanding ( (R_i - β α P_i)^2 ):

$$(R_i - β α P_i)^2 = R_i^2 - 2β α P_i R_i + (β α P_i)^2$$

Multiplying by w_i and summing over all data points:

$$Loss = Σ w_i * (R_i^2 - 2β α P_i R_i + β^2 α^2 P_i^2)$$

Solving for α

To minimize for α, taking the derivative and solving we get:

$$α = (Σ w_i P_i R_i) / (Σ w_i P_i^2 * β)$$

Verifying this change

New unit tests added to cover this

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changes to the CustomResourceDescriptors: no
Core observer or reconciler logic that is regularly executed: no

…ory using linear regression

pchoudhury22 · 2025-04-02T14:29:09Z

Hi @gyfora , Please help review the PR! Thanks!

[FLINK-30571] Estimate scalability coefficient from past scaling hist…

6f01454

…ory using linear regression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-30571] Estimate scalability coefficient from past scaling history using linear regression #966

[FLINK-30571] Estimate scalability coefficient from past scaling history using linear regression #966

pchoudhury22 commented Apr 2, 2025

pchoudhury22 commented Apr 2, 2025

[FLINK-30571] Estimate scalability coefficient from past scaling history using linear regression #966

Are you sure you want to change the base?

[FLINK-30571] Estimate scalability coefficient from past scaling history using linear regression #966

Conversation

pchoudhury22 commented Apr 2, 2025

What is the purpose of the change

Brief change log

The Linear Model

Weighted Squared Error

Minimizing the Error

Solving for α

Verifying this change

Does this pull request potentially affect one of the following parts:

pchoudhury22 commented Apr 2, 2025