Skip to content

Commit 35eb14e

Browse files
committed
Fix langinfo(ALT_DIGITS)
This has never worked properly before in Perl. The code is returning the result of the libc function nl_langinfo(). The documentation for it that I have found (and presumably my predecessors) is very unclear. But what actually happens (from using gdb) is that the return is very C unfriendly. Instead of returning a NUL-terminated string, it returns 100 (perhaps fewer) NUL-terminated strings in a row. When it is fewer (given the few examples I've seen), the final one ends with two NULs in a row. (I can't think of a way for it to work and be otherwise). The 100th one doesn't necessarily have two terminating NULs. Prior to this commit, only the string for the zeroth digit was returned; now the entire ALT_DIGIT string sequence is returned, forcing a double NUL at the end of the final one. This information is accessible in several ways. Via XS, one can use any of several functions, including the newly introduced sv_langinfo(), returning an SV, which allows for easier handling of embedded NULs. (Otherwise in XS, using the functions that return a char*, one has to look for the double-NUL.) From Perl-space, the access is via I18N::Langinfo, which behind the scenes also uses an SV. The documentation added in this commit gives advice for how to turn the return into an @array for more convenient access.
1 parent 9e0fc2d commit 35eb14e

File tree

4 files changed

+321
-39
lines changed

4 files changed

+321
-39
lines changed

ext/I18N-Langinfo/Langinfo.pm

+60-9
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ XSLoader::load();
7777
1;
7878
__END__
7979
80+
=encoding utf8
81+
8082
=head1 NAME
8183
8284
I18N::Langinfo - query locale information
@@ -186,13 +188,52 @@ C<9$95>.
186188
187189
=item *
188190
189-
For an alternate representation of digits, for the
190-
radix character used between the integer and the fractional part
191-
of decimal numbers, the group separator string for large-ish floating point
192-
numbers (yes, the final two are redundant with
191+
For the radix character used between the integer and the fractional part of
192+
decimal numbers, and the group separator string for large-ish floating point
193+
numbers (yes, these are redundant with
193194
L<POSIX::localeconv()|POSIX/localeconv>):
194195
195-
ALT_DIGITS RADIXCHAR THOUSEP
196+
RADIXCHAR THOUSEP
197+
198+
=item *
199+
200+
For any alternate digits used in this locale besides the standard C<0..9>:
201+
202+
ALT_DIGITS
203+
204+
This returns a sequence of alternate numeric reprsesentations for the numbers
205+
C<0> ... up to C<99>. The representations are returned in a single string,
206+
with a semi-colon C<;> used to separated the individual ones.
207+
208+
Most locales don't have alternate digits, so the string will be empty.
209+
210+
To access this data conveniently, you could do something like
211+
212+
use I18N::Langinfo qw(langinfo ALT_DIGITS);
213+
my @alt_digits = split ';', langinfo(ALT_DIGITS);
214+
215+
The array C<@alt_digits> will contain 0 elements if the current locale doesn't
216+
have alternate digits specified for it. Otherwise, it will have as many
217+
elements as the locale defines, with C<[0]> containing the alternate digit for
218+
zero; C<[1]> for one; and so forth, up to potentially C<[99]> for the
219+
alternate representation of ninety-nine.
220+
221+
Be aware that the alternate representation in some locales for the numbers
222+
0..9 will have a leading alternate-zero, so would look like the equivalent of
223+
00..09.
224+
225+
Running this program
226+
227+
use I18N::Langinfo qw(langinfo ALT_DIGITS);
228+
my @alt_digits = split ';', langinfo(ALT_DIGITS);
229+
splice @alt_digits, 15;
230+
print join " ", @alt_digits, "\n";
231+
232+
on a Japanese locale yields
233+
234+
S<C<〇 一 二 三 四 五 六 七 八 九 十 十一 十二 十三 十四>>
235+
236+
on some platforms.
196237
197238
=item *
198239
@@ -245,6 +286,16 @@ Only the values for English are returned. C<YESSTR> and C<NOSTR> have been
245286
removed from POSIX 2008, and are retained here for backwards compatibility.
246287
Your platform's C<nl_langinfo> may not support them.
247288
289+
=item C<ALT_DIGITS>
290+
291+
On systems with a C<L<strftime(3)>> that recognizes the POSIX-defined C<%O>
292+
format modifier (not Windows), perl tries hard to return these. The result
293+
likely will go as high as what C<nl_langinfo()> would return, but not
294+
necessarily; and the numbers from C<0..9> will always be stripped of leading
295+
zeros.
296+
297+
Without C<%O>, an empty string is always returned.
298+
248299
=item C<D_FMT>
249300
250301
Always evaluates to C<%x>, the locale's appropriate date representation.
@@ -258,11 +309,11 @@ Always evaluates to C<%X>, the locale's appropriate time representation.
258309
Always evaluates to C<%c>, the locale's appropriate date and time
259310
representation.
260311
261-
=item C<ALT_DIGITS>
312+
=item C<CRNCYSTR>
262313
263-
Currently this gives the same results as Linux does. If you have examples of
264-
it needing to work differently, please file a report at
265-
L<https://github.com/Perl/perl5/issues>.
314+
The return may be incorrect for those rare locales where the currency symbol
315+
replaces the radix character. If you have examples of it needing to work
316+
differently, please file a report at L<https://github.com/Perl/perl5/issues>.
266317
267318
=item C<ERA_D_FMT>
268319

lib/locale.t

+45
Original file line numberDiff line numberDiff line change
@@ -2495,6 +2495,51 @@ foreach my $Locale (@Locale) {
24952495
print "# failed $locales_test_number locale '$Locale' numbers @f\n"
24962496
}
24972497
}
2498+
2499+
{
2500+
my @f = ();
2501+
++$locales_test_number;
2502+
$test_names{$locales_test_number} =
2503+
'Verify ALT_DIGITS returns nothing, or else non-ASCII and'
2504+
. ' the single char digits evaluate to consecutive integers'
2505+
. ' starting at 0';
2506+
2507+
my $alts = langinfo(ALT_DIGITS);
2508+
if ($alts) {
2509+
my @alts = split ';', $alts;
2510+
my $prev = -1;
2511+
foreach my $num (@alts) {
2512+
if ($num =~ /[[:ascii:]]/) {
2513+
push @f, disp_str($num);
2514+
last;
2515+
}
2516+
2517+
# We only look at single character strings; likely locales
2518+
# that have alternate digits have a different mechanism for
2519+
# representing larger numbers. Japanese for example, has a
2520+
# single character for the number 10, which is prefixed to the
2521+
# '1' symbol for '11', etc. And 21 is represented by 3
2522+
# characters, the '2' symbol, followed by the '10' symbol,
2523+
# then the '1' symbol. (There is nothing to say that a locale
2524+
# even has to use base 10.)
2525+
last if length $num > 1;
2526+
2527+
use Unicode::UCD 'num';
2528+
my $value = num($num);
2529+
if ($value != $prev + 1) {
2530+
push @f, disp_str($num);
2531+
last;
2532+
}
2533+
2534+
$prev = $value;
2535+
}
2536+
}
2537+
2538+
report_result($Locale, $locales_test_number, @f == 0);
2539+
if (@f) {
2540+
print "# failed $locales_test_number locale '$Locale' numbers @f\n"
2541+
}
2542+
}
24982543
}
24992544

25002545
my $final_locales_test_number = $locales_test_number;

0 commit comments

Comments
 (0)