Skip to content

exif_read_data() cannot read smaller stream wrapper chunk sizes #10834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dotpointer opened this issue Mar 12, 2023 · 2 comments · Fixed by ThePHPF/thephp.foundation#90
Closed

Comments

@dotpointer
Copy link

dotpointer commented Mar 12, 2023

Description

exif_read_data() outputs the warning below when it reads from a stream wrapper source and the stream chunk size that is set by stream_set_chunk_size() is too small. What is considered a too small chunk size seems to depend on the file size and maybe the EXIF data size inside it. On a 271 KB JPEG it was 113, but on a 1,1M JPEG was 7397. Sometimes even the default chunk size is too small and it needs to be adjusted to work.

The example stream wrapper class is a modified version of the VariableStream example from the manual:
https://www.php.net/manual/en/stream.streamwrapper.example-1.php

I have tested this on Debian Linux 11 with PHP 7.4.33 and on the same system in a Docker environment with PHP 8.1.16, the bug is in both versions.

I found JPEG files with EXIF data on the Lorem Picsum page, but I have tried other sources too with the same error:
https://picsum.photos/1920/1080

If there is something wrong with the example code please tell me how to correct it so I can learn something please. :blush I have tried different variants of this code like writing to the stream before reading it using cURL calls which makes no difference and also passing a fopen() file handle that reads image file directly which makes the bug disappear.

The PHP code:

<?php

error_reporting(E_ALL);
ini_set('display_errors', 1);

# set the path to a (JPEG) image file with EXIF data
# a big file like +1 MB is recommended
define('IMAGE_FILE', './example.jpg');

class VariableStream {
  var $data;
  var $position;

  function stream_close() {
    # echo "Stream close\n";
    return true;
  }

  function stream_eof() {
    # echo 'Stream eof - '.($this->position >= strlen($this->data ? 'Yes' : 'No'))."\n";
    return $this->position >= strlen($this->data);
  }

  function stream_open($path, $mode, $options, &$opened_path) {
    # echo "Stream opened\n";
    $this->position = 0;
    $this->data = file_get_contents(IMAGE_FILE);
    return true;
  }

  function stream_seek($offset, $whence) {
    # echo "Stream seek $offset $whence\n";
    switch ($whence) {
      case SEEK_SET:
        if ($offset < strlen($this->data) && $offset >= 0) {
              $this->position = $offset;
              return true;
        } else {
              return false;
        }
        break;
      case SEEK_CUR:
        if ($offset >= 0) {
              $this->position += $offset;
              return true;
        } else {
              return false;
        }
        break;
      case SEEK_END:
        if (strlen($this->data) + $offset >= 0) {
              $this->position = strlen($this->data) + $offset;
              return true;
        } else {
              return false;
        }
        break;
      default:
        return false;
    }
  }

  function stream_read($count) {
    # echo "Stream read\n";
    $ret = substr($this->data, $this->position, $count);
    $this->position += strlen($ret);
    return $ret;
  }

  function stream_tell() {
    # echo 'Stream tell: '.$this->position."\n";
    return $this->position;
  }
}

stream_wrapper_register('var', 'VariableStream')
  or die('Failed to register protocol');

$fp = fopen('var://myvar', 'rb');

# try different chunk sizes here to find the threshold
# where the warnings appear and disappear
stream_set_chunk_size($fp, 8000);

# error appear after this line
$headers = exif_read_data($fp);

var_dump($headers);

fclose($fp);

?>

Resulted in this output:

Warning: exif_read_data(): Error reading from file: got=x004C(=76) != itemlen-2=x2838(=10296) in example.php on line 86

Warning: exif_read_data(): Invalid JPEG file in example.php on line 86
bool(false)

But I expected this output instead:

No warnings but the EXIF header data outputted as an array by var_dump.

PHP Version

PHP 8.1.16

Operating System

Debian 11

@nielsdos
Copy link
Member

nielsdos commented Mar 12, 2023

This code is the problem:

/* just break anyway, to avoid greedy read for file://, php://memory, and php://temp */
if ((stream->wrapper != &php_plain_files_wrapper) &&
(stream->ops != &php_stream_memory_ops) &&
(stream->ops != &php_stream_temp_ops)) {
break;
}

Since you have your own defined stream, the if triggers and breaks early, which results in not reading all the necessary data. I think we might need to check for user-defined stream here too. If that user-defined stream would do reads from a stream as well, then this condition will still work as expected. In your case it would mean that the condition is not hit because you're using the file stream type indirectly because of the file_get_contents.

@nielsdos
Copy link
Member

nielsdos commented Mar 12, 2023

Changing the if to include userspace streams results in a failure of the test ext/standard/tests/streams/stream_set_chunk_size.phpt.
This was changed in 0a45e8f. From the commit description it sounds like the problem was with php://fd/* streams. However, as a side effect of the commit the use-case of OP was affected. Adding an exception for local userspace streams would revert the changes made to that test file. I don't know for sure if that is what we want, but I couldn't find anything in the documentation stating that the userspace streams are non-greedy, so I think it can be changed.

By adding such an exception we essentially move the greedy/non-greedy problem to whatever underlying code the userspace stream is using. If the userspace stream has all the data available already (like in OP's case), I don't think there's a need to be non-greedy. If the userspace stream does not yet have all data available, then itself probably uses a stream and then the decision for greedy/non-greedy will fall on that stream. So I think it would be okay to change this.

nielsdos added a commit to nielsdos/php-src that referenced this issue Mar 12, 2023
…chunk sizes

Exif uses php_stream_read() to read data from the stream and process it.
In this case, it's a userspace stream. Userspace streams can either by
local or url streams.

In 5060fc2 and subsequently 0a45e8f, the behaviour of the
check was changed twice in total. In the latter commit the description
talks about exceptions for streams that are local to the process. It
looks like the exception for local userspace streams was forgotten. This
patch updates the check such that local userspace streams also read
greedily, while keeping url userspace streams unchanged.

This also updates the existing test to test both local and url userspace
streams.
nielsdos added a commit to nielsdos/php-src that referenced this issue Mar 13, 2023
…chunk sizes

Exif uses php_stream_read() to read data from the stream and process it.
In this case, it's a userspace stream. Userspace streams can either by
local or url streams.

In 5060fc2 and subsequently 0a45e8f, the behaviour of the
check was changed twice in total. In the latter commit the description
talks about exceptions for streams that are local to the process. It
looks like the exception for local userspace streams was forgotten. This
patch updates the check such that local userspace streams also read
greedily, while keeping url userspace streams unchanged.

This also updates the existing test to test both local and url userspace
streams.
nielsdos added a commit to nielsdos/php-src that referenced this issue Mar 24, 2023
…chunk sizes

php_stream_read() may return less than the requested amount of bytes by
design. This patch introduces a static function for exif which reads
from the stream in a loop until all the requested bytes are read.

Test was
Co-authored-by: dotpointer
nielsdos added a commit to nielsdos/php-src that referenced this issue Mar 24, 2023
…chunk sizes

php_stream_read() may return less than the requested amount of bytes by
design. This patch introduces a static function for exif which reads
from the stream in a loop until all the requested bytes are read.

Test was
Co-authored-by: dotpointer
nielsdos added a commit that referenced this issue May 12, 2023
* PHP-8.1:
  Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
nielsdos added a commit that referenced this issue May 12, 2023
* PHP-8.2:
  Fix GH-10834: exif_read_data() cannot read smaller stream wrapper chunk sizes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment