[sugj-tech:7415] Samba の内部文字コード固定化について (Re: CH_DISPLAY and gettext)
TAKAHASHI Motonobu
monyo @ monyo.com
2011年 6月 24日 (金) 02:51:42 JST
たかはしもとのぶです。
日記の方にも簡単に書きましたが、ここにきてようやくというべきか、Samba
本家の側から Samba の内部文字コードを unix charset パラメータで指定す
る方式から固定にしませんかという議論が出てきています。
・[Samba]7年ぶりに再燃!? - Sambaの内部文字コード議論
<http://damedame.monyo.com/?date=20110624#p01>
UTF-8 か UTF-16 かという議論はありますが、いずれにしても内部文字コード
が固定化されることは大きなメリットがあると考えています。
ただ、現時点では議論が提起されたところですので、どこに落ち着くか、まだ
まだ予断を許さない状態だと考えています。
ということで、是非みなさまの応援をよろしくお願いできればと思います。
From: Michael Adam <obnox @ samba.org>
Subject: Re: CH_DISPLAY and gettext
Date: Thu, 23 Jun 2011 15:04:27 +0200
> I have some points of criticism with CH_UNIX used as charset to
> internally store strings (file names, user names, etc) in memory
> as well as in databases. I am sure that there have been very good
> reasons for introducing CH_UNIX as internal encoding in the past,
> but I am questioning this anyways:
>
> 1) This yields information too early!
> The mapping Unicode --> CH_UNIX is potentially lossy.
> E.g. if I use ASCII or some latin/iso charset, then some characters
> will not be displayable. Maybe even unmarshalling will fail
> so users will not be available, depending on the value of CH_UNIX.
>
> 2) Storing our internal databases (s3 eg: group mapping, passdb)
> in CH_UNIX is a very bad thing: This encoding might be changed
> by the administrators and the databases are not coverted
> automatically. Neither is the file system but there is convmv
> for this. But for the internal DBs there is not even a
> conversion tools. I have to look which other databases are
> stored in which encoding, especially samba4.
>
> I have been in quite cumbersome manual db repair due to this
> problem more than once already. This was really bad!
>
> In order to fix #2, there are two options:
>
> a) Change the dbs (individually) to convert from internal
> representation to UTF8 (or UTF16 maybe), before storing.
>
> b) change samba to internally store everyhting in UTF8
> and then write out the DBs unchanged.
> For every target that needs a special encoding (like
> the file system needing CH_UNIX), we'd then need to convert
> before accessing the target (like I detailed in my
> previous emails).
>
> In either case we also need a encoding conversion tool for each
> such database, since afaik we can not reliably autodetect
> the encoding of the stored data.
>
> In order to fix #1 though, option (b) is the only possible way.
>
>
> So my wish would be to convert all of samba to use UTF8
> internally (I'd be ready to discuss a different unicode
> charset like UTF16), and convert to CH_UNIX for the necessary
> communication interfaces with the outside.
>
>
> I hope this makes my argument a little clearer.
>
> Cheers - Michael
---
TAKAHASHI Motonobu <monyo @ monyo.com> / @damemonyo
http://damedame.monyo.com/ / http://facebook.com/monyot
sugj-tech メーリングリストの案内