Bug #141

Incorrect character decoding/encoding in 2.0

Added by Miloslav Rauš 377 days ago. Updated 377 days ago.

Status:Feedback Start:02/27/2009
Priority:Normal Due date:
Assigned to:- % Done:

0%

Category:-
Target version:-
Resolution:


Description

all accented characters are translated to character with value 65533.

partial output from test program linked against HDBC-ODBC 2.1.0:
"\65533ern\65533 Eva Mgr."
"\65533uda Robert Mgr."
"ANTO\65533OV\65533 Helena Ing."
"B\65533lkov\65533 Kristina"
"BU\65533INOV\65533 Helena"
...

which looks like this when linked against HDBC-ODBC<=2.0 :
"\200ern\225 Eva Mgr."
"\200uda Robert Mgr."
"ANTO\138OV\193 Helena Ing."
"B\236lkov\225 Kristina"
"BU\200INOV\193 Helena"
...

test script:
import Database.HDBC.ODBC
import Database.HDBC

main = do
con <- connectODBC $ "Driver={Microsoft dBASE Driver (*.dbf)};DriverID=277;Dbq=" ++ path
dt <- quickQuery' con ("SELECT testfield FROM test_db") []
mapM_ (putStrLn . show) $ conv dt
where path = "C:\\TESTDATA"
conv = map (\[f] -> fromSql f) :: SqlValue -> [String]

History

Updated by Miloslav Rauš 377 days ago

i forgot to mention that error messages are also mutilated in the same way:

theResult.exe: SqlError {seState = "[\"42S02\"]", seNativeError = -1, seErrorMsg = "execute execute: [\"-1305: [Microsoft][Ovlada\\65533 ODBC pro dBase] Datab\\
65533zov\\65533 stroj Microsoft Jet nem\\65533\\65533e naj\\65533t objekt test_b
b. Zkontrolujte, zda objekt existuje, a zda jste spr\\65533vn\\65533 zadali jeho
n\\65533zev a cestu.\"]"}

Updated by John Goerzen 377 days ago

  • Status changed from New to Feedback

Are you saying that the data is corrupted on the way to the database or the way back from it?

In other words, does the INSERT cause the problem or does SELECT?

Can you duplicate it using HDBC-ODBC against a free database such as PostgreSQL so I can test it locally as well?

Updated by Miloslav Rauš 377 days ago

Are you saying that the data is corrupted on the way to the database or the way back from it?

I haven't tried inserts so far - read-only presentation of some legacy data. don't know whether the problem coudn't be caused by the fact that the data is stored in DBF files encoded in CP852, then automatically converted by ODBC internals to CP1250.

But alas, even ODBC error messages are corrupted ...

In other words, does the INSERT cause the problem or does SELECT?

Can you duplicate it using HDBC-ODBC against a free database such as PostgreSQL so I can test it locally as well?

I'll try and see.

Updated by John Goerzen 377 days ago

It sounds like you may need to ask your database to communicate with the library using UTF-8. How to do this varies by database.

Updated by Miloslav Rauš 377 days ago

It sounds like you may need to ask your database to communicate with the library using UTF-8. How to do this varies by database.

My "database" is just MS ODBC Driver for DBF, and is probably responsponsible for (correctly) recoding the data from source encoding (CP852 in my case) into the OS default (CP1250 in my case),

I tried googling but the only relevant responses were about "force this specific DB engine to output UTF-8", nothing along the lines "and this tells the ODBC engine to recode the data to a specific charset".

So HDBC allways expects all the incoming data (and error messages) in UTF-8 ?

Also, when displaying CP1250 as UTF-8, every specific character gets deformed into another specific character, not all into just one (which made me thougt it was a bug).

PS: No luck trying to find how to force ODBC into re-encoding the data into another codepage, ever heard it was possible ?

Also available in: Atom PDF