Discussion:
[perl #54448] unicode and macosx
(too old to reply)
Stephane Payrard
2008-05-19 17:29:29 UTC
Permalink
# New Ticket Created by Stephane Payrard
# Please include the string: [perl #54448]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=54448 >


On a macintel 10.5 I have some problem with unicode. unicode
characters are not recognized as such. See the rakudo test below

The configuring phase gives :

Determining whether ICU is installed...................................yes.

The compiling phase finish with an error but it apprently causes no
problems except I can't run 'make test' because of
the dependance on a successful compilation.

ar: blib/lib/libparrot.a is a fat file (use libtool(1) or lipo(1) and
ar(1) on it)
ar: blib/lib/libparrot.a: Inappropriate file type or format
make: *** [blib/lib/libparrot.a] Error 1

rakudo is generated without problem

But the following test fails. I pasted the content of the literal
string with a character that emacs says to be #x8a0
my $s = " "; say $s.chars # $s == "\x8a0"
2


I expected one.
--
cognominal stef
Patrick R. Michaud
2008-05-19 18:53:42 UTC
Permalink
Post by Stephane Payrard
But the following test fails. I pasted the content of the literal
string with a character that emacs says to be #x8a0
my $s = " "; say $s.chars # $s == "\x8a0"
2
I expected one.
Because Parrot's primary support for unicode is utf-8 encoding,
and because utf-8 greatly slows down parsing of long strings
(such as program source code), we've elected for the time being
to have rakudo use "fixed8" for its default input encoding. When
Parrot becomes faster at processing unicode strings, we'll likely
switch the default to utf8.(*)

This doesn't mean that unicode can't be used in rakudo programs,
though. One can always encode the character explicitly:

$ ./parrot perl6.pbc
Post by Stephane Payrard
my $s = "€"; say $s.chars; # doesn't work
3
Post by Stephane Payrard
my $s = "\x20ac"; say $s.chars; # works
1

Also, rakudo understands the --encoding=utf8 option to specify that
the source code is coming in as UTF-8:

$ ./parrot perl6.pbc --encoding=utf8
Post by Stephane Payrard
my $s = "€"; say $s.chars; # works
1

For now I'll mark this ticket as "stalled", awaiting faster Parrot
unicode support or a decision that we're going to live with
slower parsing of source code.

Thanks!

Pm

(*) Another option we might have could be to default to utf8 and
transcode to ucs2 on platforms that have ICU present (which can be
faster), but stay at a fixed8 default for systems without ICU.
But at this stage I think consistency and explicit options are
better, otherwise people will be confused as to why a particular
program works on some systems but not others.
Stephane Payrard
2008-12-16 10:43:27 UTC
Permalink
# New Ticket Created by Stephane Payrard
# Please include the string: [perl #61394]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=61394 >


my $s = " "; say $s.chars # now returns 1

Note : the bug was reported on macintel 32 bits which died. I am now
testing on a macintel 64 bits.
I don't know if it can affect the test.
Post by Stephane Payrard
On a macintel 10.5 I have some problem with unicode. unicode
characters are not recognized as such. See the rakudo test below
Determining whether ICU is installed...................................yes.
The compiling phase finish with an error but it apprently causes no
problems except I can't run 'make test' because of
the dependance on a successful compilation.
ar: blib/lib/libparrot.a is a fat file (use libtool(1) or lipo(1) and
ar(1) on it)
ar: blib/lib/libparrot.a: Inappropriate file type or format
make: *** [blib/lib/libparrot.a] Error 1
rakudo is generated without problem
But the following test fails. I pasted the content of the literal
string with a character that emacs says to be #x8a0
my $s = " "; say $s.chars # $s == "\x8a0"
2
I expected one.
--
cognominal stef
--
cognominal stef
Loading...