[Dxspider-support] Character encoding?
Dirk Koopman
djk at tobit.co.uk
Wed Oct 24 14:41:57 CEST 2007
Bela Markus wrote:
> Hi,
>
> what is about character encoding within a SPIDER network and in general
> in DXCLUSTER? While in practice I do not see special characters in spots
> and announcements, are there any restrictions, conventions or rules?
This is a huuuuuuuuge can of worms. In essence, there are no standards,
conventions or rules. Unlike you, I see lots of special characters in
spots and announces. It causes a number of headaches when I am trying to
de-duplicate them and I don't always get it right.
Then there are the problems of perl versions. There is a partition wall
between 5.8.1 and earlier versions of perl. Anything before 5.8.1 will
not handle utf8 (my preferred solution) properly and treats any
characters as single byte, locale based, values. So whatever éáőúűüö is
in your locale, that's what gets used. If someone else's (receiving)
locale treats that byte value as something else - well, that's just tough!
And then there are the authors out there that will insist in trying to
"clean up" text. My rule has been, for a very long time, that provided
the PC sentence that comes in is valid, it goes out *exactly* as it
comes in, with only the hop count being modified. The problem is that
there is an implementation out there (and I don't know who's it is) that
converts locale stuff to utf8 or vice versa (I haven't worked out which
way either). This really does not help!
>
> My system is running on CentOS 5 with UTF-8 and Hungarian locales. A
> quick try in local announcements shows that special characters like
> éáőúűüö etc. are OK. In the log file they are converted to single byte
> codes. No more examination done.
I would like to go to utf8. But I can't see this happening for quite a
while yet. And if I *do* go down that road it will mean that everyone
will *have* to upgrade to a modern perl. Which will cause many people
that are (still) running on things like Redhat 6.2 more pain than the
occasional dupe spot or announce.
When you say single byte codes, I am presuming you mean things like %E3?
Dirk
More information about the Dxspider-support
mailing list