Skip to content

Commit 6ec3e3e

Browse files
fix: correct WhitespaceSplit Pretokenizer handling of invisible space chars
1 parent e8a8a9a commit 6ec3e3e

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

src/PreTokenizers/WhitespaceSplit.php

+3-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ public function __construct(protected array $config)
1414

1515
public function preTokenizeText(string|array $text, array $options): array
1616
{
17-
return explode(' ', $text);
17+
preg_match_all('/\S+/', $text, $matches);
18+
19+
return $matches[0] ?? [];
1820
}
1921
}

0 commit comments

Comments
 (0)