Skip to content

Improved Support for INDIRECT, ROW, and COLUMN Functions #1995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Improved Support for INDIRECT, ROW, and COLUMN Functions #1995

wants to merge 9 commits into from

Conversation

oleibman
Copy link
Collaborator

@oleibman oleibman commented Apr 8, 2021

This should address issues #1913 and #1993. INDIRECT had heretofore not supported an optional parameter intended to support addresses in R1C1 format which was introduced with Excel 2010. It also had not supported the use of defined names as an argument.

The unit tests for INDIRECT had used mocking, and were sorely lacking (tested only error conditions). They have been replaced with normal, and hopefully adequate, tests. This includes testing globally defined names, as well as locally defined names, both in and out of scope.

The test case in 1913 was too complicated for me to add as a unit test. The main impediments to it are now removed, and its complex situation, will I hope, be corrected with this fix.

This is:

- [x] a bugfix
- [ ] a new feature

Checklist:

Why this change is needed?

oleibman added 4 commits April 8, 2021 16:14
This should address issues #1913 and #1993. INDIRECT had heretofore not supported an optional parameter intended to support addresses in R1C1 format which was introduced with Excel 2010. It also had not supported the use of defined names as an argument.

The unit tests for INDIRECT had used mocking, and were sorely lacking (tested only error conditions). They have been replaced with normal, and hopefully adequate, tests. This includes testing globally defined names, as well as locally defined names, both in and out of scope.

The test case in 1913 was too complicated for me to add as a unit test. The main impediments to it are now removed, and its complex situation, will I hope, be corrected with this fix.
One doc block, one in deprecated function.
2 corrections.
They need it too, just like INDIRECT.
@oleibman oleibman changed the title Improved Support for INDIRECT Function Improved Support for INDIRECT, ROW, and COLUMN Functions Apr 14, 2021
@oleibman
Copy link
Collaborator Author

The ROW and COLUMN functions also should support defined names. I have added that, and test cases, with the latest push. ROWS and COLUMNS already supported it correctly, but there were no test cases. Because ROW and COLUMN can return arrays, and PhpSpreadsheet does not support dynamic arrays, I left the existing direct-call tests unchanged to demonstrate those capabilities.

Looks like requirements have tightened since my prior push.
Still getting used to it.
Third times a charm?
@oleibman
Copy link
Collaborator Author

@PowerKiKi, I am getting an error I do not understand from Phpstan.

Error: Ignored error pattern #^Cannot cast array\|string to string\.$# in path /home/runner/work/PhpSpreadsheet/PhpSpreadsheet/src/PhpSpreadsheet/Calculation/LookupRef/RowColumnInformation.php was not matched in reported errors.

If I'm reading it correctly, it says it's expecting to receive an error, and failing because it doesn't receive one. Is that true? Whether it is true or not, do you know how I would go about fixing it?

@MarkBaker
Copy link
Member

I have a small test script that I use to debug the calculation engine, dumping the token stack, and that enables the calculation engine log to show the various steps taken to evaluate a formula:

$spreadSheet = new Spreadsheet();
$workSheet = $spreadSheet->getActiveSheet();
$workSheet->setCellValue('A1', 'EUR');
$workSheet->setCellValue('A2', 'USD');
$workSheet->setCellValue('B1', 360);
$workSheet->setCellValue('B2', 300);

$spreadSheet->addNamedRange(new \PhpOffice\PhpSpreadsheet\NamedRange('EUR', $workSheet, '=$B$1'));
$spreadSheet->addNamedRange(new \PhpOffice\PhpSpreadsheet\NamedRange('USD', $workSheet, '=$B$2'));

$cell = $workSheet->getCell('E1');

$cell->setValue('=INDIRECT("USD")');

$spreadSheet->setActiveSheetIndex(0);

$formulaCells = [
    'E1',
];

foreach ($formulaCells as $cellAddress) {
    $result = evaluate($spreadSheet, $spreadSheet->getActiveSheet(), $cellAddress);
}

function evaluate(Spreadsheet $spreadSheet, Worksheet $workSheet, string $cell)
{
    // Initialise the calculation engine for debug logging
    $calculationEngine = Calculation::getInstance($spreadSheet);
    $debugLog = $calculationEngine->getDebugLog();

    $calculationEngine->flushInstance();
    $debugLog->setWriteDebugLog(true);

    $formulaValue = $workSheet->getCell($cell)->getValue();

    echo PHP_EOL, 'Formula value for evaluation is ', $formulaValue, PHP_EOL;

    $canExecuteCalculation = false;

    try {
        $tokens = $calculationEngine->parseFormula($formulaValue, $workSheet->getCell($cell));
        echo 'Parser Stack :-', PHP_EOL;
        print_r($tokens);
        $canExecuteCalculation = true;
    } catch (Exception $e) {
        echo 'PARSER ERROR: ', $e->getMessage(), PHP_EOL;

        echo 'Parser Stack :-';
        print_r($tokens);
    }

    $callStartTime = microtime(true);

    if ($canExecuteCalculation) {
        //  calculate
        try {
            $cellValue = $workSheet->getCell($cell)->getCalculatedValue();

            echo 'Result is ', $cellValue, PHP_EOL;

            echo 'Evaluation Log:', PHP_EOL;
            print_r($debugLog->getLog());
        } catch (Exception $e) {
            echo 'CALCULATION ENGINE ERROR: ', $e->getMessage(), PHP_EOL;

            echo 'Evaluation Log:', PHP_EOL;
            print_r($debugLog->getLog());
        }
    }

    $callEndTime = microtime(true);
    $callTime = $callEndTime - $callStartTime;

    echo PHP_EOL;
    echo 'Call time to evaluate formula was ', sprintf('%.4f', $callTime), ' seconds', PHP_EOL;

    return $cellValue;
}

// Echo memory usage
echo ' Current memory usage: ' , (memory_get_usage(true) / 1024) , ' KB' , PHP_EOL;
echo '    Peak memory usage: ' , (memory_get_peak_usage(true) / 1024) , ' KB' , PHP_EOL;

In this case, I've set it up to evaluate INDIRECT() with a defined name as per Issue #1993. But when I run it against your branch, it's output shows that it still isn't evaluating the formula correctly, but just treating the "USD" defined name as a string literal.

Formula value for evaluation is =INDIRECT("USD")
Parser Stack :-
Array
(
    [0] => Array
        (
            [type] => Value
            [value] => "USD"
            [reference] =>
        )

    [1] => Array
        (
            [type] => Operand Count for Function INDIRECT()
            [value] => 1
            [reference] =>
        )

    [2] => Array
        (
            [type] => Function
            [value] => INDIRECT(
            [reference] =>
        )

)
Result is #REF!
Evaluation Log:
Array
(
    [0] => Testing cache value for cell Worksheet!E1
    [1] => Evaluating formula for cell Worksheet!E1
    [2] => Formula for cell Worksheet!E1 is INDIRECT("USD")
    [3] => Worksheet!E1 => Evaluating Function INDIRECT() with 1 argument
    [4] => Worksheet!E1 => Evaluating INDIRECT( "USD" )
    [5] => Worksheet!E1 => Evaluation Result for INDIRECT() function call is a #REF! error
)

Call time to evaluate formula was 0.0170 seconds
 Current memory usage: 6144 KB
    Peak memory usage: 6144 KB

Running the same script against my early partial PR to resolve #1993 gives the following correct result:

Formula value for evaluation is =INDIRECT("USD")
Parser Stack :-
Array
(
    [0] => Array
        (
            [type] => Value
            [value] => "USD"
            [reference] =>
        )

    [1] => Array
        (
            [type] => Operand Count for Function INDIRECT()
            [value] => 1
            [reference] =>
        )

    [2] => Array
        (
            [type] => Function
            [value] => INDIRECT(
            [reference] =>
        )

)
Result is 300
Evaluation Log:
Array
(
    [0] => Testing cache value for cell Worksheet!E1
    [1] => Evaluating formula for cell Worksheet!E1
    [2] => Formula for cell Worksheet!E1 is INDIRECT("USD")
    [3] => Worksheet!E1 => Evaluating Function INDIRECT() with 1 argument
    [4] => Worksheet!E1 => Evaluating INDIRECT( "USD" )
    [5] => Worksheet!E1 => Defined Name is a Range with a value of =$B$2
    [6] => Worksheet!E1 => Value adjusted for relative references is =$B$2
    [7] => Worksheet!E1 => Worksheet!E1 => Evaluating Cell B2 in current worksheet
    [8] => Worksheet!E1 => Worksheet!E1 => Evaluation Result for cell Worksheet!B2 is an integer number with a value of 300
    [9] => Worksheet!E1 => Evaluation Result for Named Range USD is an integer number with a value of 300
    [10] => Worksheet!E1 => Evaluation Result for INDIRECT() function call is an integer number with a value of 300
)

Call time to evaluate formula was 0.0135 seconds
 Current memory usage: 6144 KB
    Peak memory usage: 6144 KB

So I'll need to take a closer look at how your INDIRECT() is handling the defined name

@MarkBaker
Copy link
Member

The PHPStan Error is the same as I was getting over the weekend: an error that PHPStan had previously been told to ignore now no longer exists because it has been fixed by your changes, so PHPStan chooses to tell you that the lack of an error is now an error.
You can resolve it by editing the phpstan-baseline.neon as per @PowerKiKi's instructions to me on gitter

@oleibman oleibman marked this pull request as draft April 14, 2021 14:20
@oleibman
Copy link
Collaborator Author

My fork was created before phpstan, so I don't think I can make the phpstan-baseline.neon change you recommend in this PR. So, I am converting this PR to a draft, and will re-create my fork with phpstan (which I was planning to do soon anyhow), and then create a new PR. It could take a couple of days. I will make sure to add your test case.

@oleibman
Copy link
Collaborator Author

@MarkBaker, I added the following test, and got the expected result:

    public function testINDIRECTEurUsd(): void
    {
        $sheet = $this->sheet;
        $sheet->getCell('A1')->setValue('EUR');
        $sheet->getCell('A2')->setValue('USD');
        $sheet->getCell('B1')->setValue(360);
        $sheet->getCell('B2')->setValue(300);

        $this->spreadsheet->addNamedRange(new NamedRange('EUR', $sheet, '$B$1'));
        $this->spreadsheet->addNamedRange(new NamedRange('USD', $sheet, '$B$2'));

        $this->setCell('E1', '=INDIRECT("USD")');

        $result = $sheet->getCell('E1')->getCalculatedValue();
        self::assertSame(300, $result);
    }

So, I'm not sure why your test script thinks the result is wrong.

There was an impression that code would fail the test in this commit.
@oleibman
Copy link
Collaborator Author

I agree that I duplicate your test script's results when my code is not part of the formal test suite. At the moment, I have no idea why.

@oleibman
Copy link
Collaborator Author

The defined name value property has a leading equals sign which needs to be deal with. New push shortly. I still don't understand the mismatching behavior between phpunit test and your test.

Failure pointed out by @MarkBaker. Prior code mysteriously succeeded as part of unit test, but failed as standalone code.
@MarkBaker
Copy link
Member

Yes, I'd just spotted the = difference myself;

@MarkBaker
Copy link
Member

Also hitting discrepancies if I load from a saved file containing that basic defined name and indirect in a formula, although this is an issue with the spreadsheet name; this time because the sheet name is being prepended on the named range value, which already contains the sheet name.

Indirect_with_Defined_Name.xlsx

@MarkBaker
Copy link
Member

Similarly with this slightly more quirky variant that uses a dropdown selector in cell D1 to decide which Currency should be selected
Indirect_with_Defined_Formula_Selection.xlsx

I feel like I'm being really evil here; but if I'm not evil now, then we'll have a user raising an issue about it in months time when the code is no longer fresh in our minds

@oleibman
Copy link
Collaborator Author

No problem about "evil". I will make sure your new test cases are covered when I re-create this PR.

@PowerKiKi
Copy link
Member

error I do not understand from Phpstan

Mark's answer was correct. For details, see #1990 (comment)

will re-create my fork with phpstan

It sounds like you might not be familiar with git rebase-ing a branch on top of master. If that's the case you might want to have a look, as it would help you keep you branch up-to-date very easily (unless massive changes conflict). See a tutorial, or the official docs.

@oleibman
Copy link
Collaborator Author

Thank you for the suggestion about rebase. I did not see it until too late, but I will attempt it the next time I want to resync. I have gotten Phpstan to accept the changes in my new push.

@oleibman
Copy link
Collaborator Author

Closing, replaced by #2004.

@oleibman oleibman closed this Apr 18, 2021
MarkBaker pushed a commit that referenced this pull request Apr 20, 2021
* Improved Support for INDIRECT, ROW, and COLUMN Functions

This should address issues #1913 and #1993. INDIRECT had heretofore not supported an optional parameter intended to support addresses in R1C1 format which was introduced with Excel 2010. It also had not supported the use of defined names as an argument. This PR is a replacement for #1995, which is currently in draft status and which I will close in a day or two.

The ROW and COLUMN functions also should support defined names. I have added that, and test cases, with the latest push. ROWS and COLUMNS already supported it correctly, but there had been no test cases. Because ROW and COLUMN can return arrays, and PhpSpreadsheet does not support dynamic arrays, I left the existing direct-call tests unchanged to demonstrate those capabilities.

The unit tests for INDIRECT had used mocking, and were sorely lacking (tested only error conditions). They have been replaced with normal, and hopefully adequate, tests. This includes testing globally defined names, as well as locally defined names, both in and out of scope.

The test case in 1913 was too complicated for me to add as a unit test. The main impediments to it are now removed, and its complex situation will, I hope, be corrected with this fix.

INDIRECT can also support a reference of the form Sheetname!localName when localName on its own would be out of scope. That functionality is added. It is also added, in theory, for ROW and COLUMN, however such a construction is rejected by the Calculation engine before passing control to ROW or COLUMN. It might be possible to change the engine to allow this, and I may want to look into that later, but it seems much too risky, and not nearly useful enough, to attempt to address that as part of this change.

Several unusual test cases (leading equals sign, not-quite-as-expected name definition in file, complex indirection involving concatenation and a dropdown list) were suggested by @MarkBaker and are included in this request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants