Skip to content

Commit ba0b9e5

Browse files
committed
Fix langinfo(ALT_DIGITS)
This has never worked properly before in Perl. The code is returning the result of the libc function nl_langinfo(). The documentation for it that I have found (and presumably my predecessors) is very unclear. But what actually happens (from using gdb) is that the return is very C unfriendly. Instead of returning a NUL-terminated string, it returns 100 (perhaps fewer) NUL-terminated strings in a row. When it is fewer (given the few examples I've seen), the final one ends with two NULs in a row. (I can't think of a way for it to work and be otherwise). The 100th one doesn't necessarily have two terminating NULs. Prior to this commit, only the string for the zeroth digit was returned; now the entire ALT_DIGIT string sequence is returned, forcing a double NUL at the end of the final one. This information is accessible in several ways. Via XS, one can use any of several functions, including the newly introduced sv_langinfo(), returning an SV, which allows for easier handling of embedded NULs. (Otherwise in XS, using the functions that return a char*, one has to look for the double-NUL.) From Perl-space, the access is via I18N::Langinfo, which behind the scenes also uses an SV. The documentation added in this commit gives advice for how to turn the return into an @array for more convenient access.
1 parent 7a42e09 commit ba0b9e5

File tree

4 files changed

+325
-42
lines changed

4 files changed

+325
-42
lines changed

ext/I18N-Langinfo/Langinfo.pm

+56-12
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ XSLoader::load();
7777
1;
7878
__END__
7979
80+
=encoding utf8
81+
8082
=head1 NAME
8183
8284
I18N::Langinfo - query locale information
@@ -155,13 +157,52 @@ For the character code set being used (such as "ISO8859-1", "cp850",
155157
156158
=item *
157159
158-
For an alternate representation of digits, for the
159-
radix character used between the integer and the fractional part
160-
of decimal numbers, the group separator string for large-ish floating point
161-
numbers (yes, the final two are redundant with
160+
For the radix character used between the integer and the fractional part of
161+
decimal numbers, and the group separator string for large-ish floating point
162+
numbers (yes, these are redundant with
162163
L<POSIX::localeconv()|POSIX/localeconv>):
163164
164-
ALT_DIGITS RADIXCHAR THOUSEP
165+
RADIXCHAR THOUSEP
166+
167+
=item *
168+
169+
For any alternate digits used in this locale besides the standard C<0..9>:
170+
171+
ALT_DIGITS
172+
173+
This returns a sequence of alternate numeric reprsesentations for the numbers
174+
C<0> ... up to C<99>. The representations are returned in a single string,
175+
with a semi-colon C<;> used to separated the individual ones.
176+
177+
Most locales don't have alternate digits, so the string will be empty.
178+
179+
To access this data conveniently, you could do something like
180+
181+
use I18N::Langinfo qw(langinfo ALT_DIGITS);
182+
my @alt_digits = split ';', langinfo(ALT_DIGITS);
183+
184+
The array C<@alt_digits> will contain 0 elements if the current locale doesn't
185+
have alternate digits specified for it. Otherwise, it will have as many
186+
elements as the locale defines, with C<[0]> containing the alternate digit for
187+
zero; C<[1]> for one; and so forth, up to potentially C<[99]> for the
188+
alternate representation of ninety-nine.
189+
190+
Be aware that the alternate representation in some locales for the numbers
191+
0..9 will have a leading alternate-zero, so would look like the equivalent of
192+
00..09.
193+
194+
Running this program
195+
196+
use I18N::Langinfo qw(langinfo ALT_DIGITS);
197+
my @alt_digits = split ';', langinfo(ALT_DIGITS);
198+
splice @alt_digits, 15;
199+
print join " ", @alt_digits, "\n";
200+
201+
on a Japanese locale yields
202+
203+
S<C<〇 一 二 三 四 五 六 七 八 九 十 十一 十二 十三 十四>>
204+
205+
on some platforms.
165206
166207
=item *
167208
@@ -214,6 +255,16 @@ Only the values for English are returned. C<YESSTR> and C<NOSTR> have been
214255
removed from POSIX 2008, and are retained here for backwards compatibility.
215256
Your platform's C<nl_langinfo> may not support them.
216257
258+
=item C<ALT_DIGITS>
259+
260+
On systems with a C<L<strftime(3)>> that recognizes the POSIX-defined C<%O>
261+
format modifier (not Windows), perl tries hard to return these. The result
262+
likely will go as high as what C<nl_langinfo()> would return, but not
263+
necessarily; and the numbers from C<0..9> will always be stripped of leading
264+
zeros.
265+
266+
Without C<%O>, an empty string is always returned.
267+
217268
=item C<D_FMT>
218269
219270
Always evaluates to C<%x>, the locale's appropriate date representation.
@@ -232,13 +283,6 @@ representation.
232283
The return may be incorrect for those rare locales where the currency symbol
233284
replaces the radix character. If you have examples of it needing to work
234285
differently, please file a report at L<https://github.com/Perl/perl5/issues>.
235-
236-
=item C<ALT_DIGITS>
237-
238-
Currently this gives the same results as Linux does. If you have examples of
239-
it needing to work differently, please file a report at
240-
L<https://github.com/Perl/perl5/issues>.
241-
242286
=item C<ERA_D_FMT>
243287
244288
=item C<ERA_T_FMT>

lib/locale.t

+45
Original file line numberDiff line numberDiff line change
@@ -2495,6 +2495,51 @@ foreach my $Locale (@Locale) {
24952495
print "# failed $locales_test_number locale '$Locale' numbers @f\n"
24962496
}
24972497
}
2498+
2499+
{
2500+
my @f = ();
2501+
++$locales_test_number;
2502+
$test_names{$locales_test_number} =
2503+
'Verify ALT_DIGITS returns nothing, or else non-ASCII and'
2504+
. ' the single char digits evaluate to consecutive integers'
2505+
. ' starting at 0';
2506+
2507+
my $alts = langinfo(ALT_DIGITS);
2508+
if ($alts) {
2509+
my @alts = split ';', $alts;
2510+
my $prev = -1;
2511+
foreach my $num (@alts) {
2512+
if ($num =~ /[[:ascii:]]/) {
2513+
push @f, disp_str($num);
2514+
last;
2515+
}
2516+
2517+
# We only look at single character strings; likely locales
2518+
# that have alternate digits have a different mechanism for
2519+
# representing larger numbers. Japanese for example, has a
2520+
# single character for the number 10, which is prefixed to the
2521+
# '1' symbol for '11', etc. And 21 is represented by 3
2522+
# characters, the '2' symbol, followed by the '10' symbol,
2523+
# then the '1' symbol. (There is nothing to say that a locale
2524+
# even has to use base 10.)
2525+
last if length $num > 1;
2526+
2527+
use Unicode::UCD 'num';
2528+
my $value = num($num);
2529+
if ($value != $prev + 1) {
2530+
push @f, disp_str($num);
2531+
last;
2532+
}
2533+
2534+
$prev = $value;
2535+
}
2536+
}
2537+
2538+
report_result($Locale, $locales_test_number, @f == 0);
2539+
if (@f) {
2540+
print "# failed $locales_test_number locale '$Locale' numbers @f\n"
2541+
}
2542+
}
24982543
}
24992544

25002545
my $final_locales_test_number = $locales_test_number;

0 commit comments

Comments
 (0)