-
Notifications
You must be signed in to change notification settings - Fork 80
Thin mode should support DB_NCHARSET 'UTF8' #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The national character set (871 - UTF8) is an older implementation of the current standard UTF-8. It is known today by the name CESU-8 and Python does not have built-in support for it. It is no longer recommended for use but (clearly!) some databases were built that way and are still in use. Adding support for CESU-8 would require writing our own encoder/decoder of that character set. One possibility, however, might be to simply defer raising the error until the first attempt to actually use the national character set is used. That sounds like it might resolve your situation since you aren't using any NCHAR, NVARCHAR2 or NCLOB columns? For now the only option you have is to use thick mode. |
|
Thanks a lot for the fast precise and comprehensive clarification. |
@damarvin I'll leave this closed but I am tracking the general problem so we know how to prioritize our efforts. I believe @anthony-tuininga's suggested enhancement will be a good for many people. |
Since the difference between UTF-8 and CESU-8 is only how surrogates are encoded, so it might be possible to implement decoding CESU-8 with Python's standard utf-8 codec and a codec error handler. See PEP 293 for details: https://peps.python.org/pep-0293/ |
supported; an error is now raised only when the first attempt to use NCHAR, NVARCHAR2 or NCLOB data is made (#16).
I've just pushed code that allows you to connect to a database using national character set UTF8 and raises the exception only upon attempting to use NCHAR, NVARCHAR2 or NCLOB data. |
supported; an error is now raised only when the first attempt to use NCHAR, NVARCHAR2 or NCLOB data is made (#16).
Connecting with
encoding='UTF8'
—asDB_NCHARSET
is set so, and same with'utf-8'
—I get'DPY-3012: national character set id 871 is not supported by python-oracledb in thin mode'
.I do not want a national character set, but good standard 'utf-8'.
What is the "Thin" mode for, when it does not support the basics, or do I misinterpret things? Is there a work-around or missing extra parameter?
So I propose the enhancement: Thin mode should support DB_NCHARSET 'UTF8'.
The text was updated successfully, but these errors were encountered: