Tuesday, October 14, 2008

How to read Data between different Characterset Databases

We have two 24x7 production databases with different charactersets. Yes, both charactersets are related to Arabic language. Database A’s characterset is AR8MSWIN1256 while database B’s characterset is AR8ISO8859P6.

We had a requirement wherein database A will read Arabic data from database B. If you create a database link and try to query data from database B it will appear to be garbage, as shown below:

whereas when you run the same query on database B, this is how the output looks:

Even you are unaware of Arabic language, you may visually compare and see the output is a garbage when queried from database A.

DUMP function comes handy under these situations. It returns internal representation of an expression. When used to against the “name” failed, I get this output.

SQL> select dump(name) from test_iso@testdb;

Typ=1 Len=13: 227,205,227,207,32,194,213,221,32,227,196,227,228

where, “Typ” refers to datatype, “Len” means the length of the string, and everything after “:” is comma separated ASCII values of all the characters in the “name” field.

I wrote a small procedure where convert all the comma separated ASCII values to character string.

create or replace function read_name (p_id in number) return varchar2 is
  t_str varchar2(2000);
  l_name varchar2(2000);
  l_output varchar2(2000);
  select 'select '||'chr('||replace(substr(dump(name, 1010), instr(dump(name, 1010), ':')+2), ',', ')|| chr(')||') from dual' 
    into t_str from test_iso@testdb
   where id = p_id;
--  dbms_output.put_line(t_str);
  execute immediate t_str into l_output;  
  return l_output;

Now upon using this function, I am able to query remote table in a readable format.

Here’s the output:


prodlife said...

Good one :)

Although I wonder why there isn't a better way to do charset conversions across a DB link.

I assume the performance of this work-around can't be good, so it'll really work only if the tables are rather small.

Asif Momen said...

Yes, I certainly agree with you that this method is not performance friendly. But, as of now its serving the purpose.

We are still looking for a better option.

Hazem Ameen said...

We have your same environment and reading/writing over db link is fine. Oracle does character-set conversion between the 2 Arabic character-set as expected.

Client NLS_LANG: America_America.ar8mswin1256
Server1: ar8mswin1256
Server 2: ar8iso8859p6
I can only suspect your client NLS_LANG

Our version is on Linux. Client is MS Windows Vista

Asif Momen said...

Dear Hazem,

As I am unable to upload image files in the comment section, I am creating a new post.

Kindly check it out.