Data Management

Oracle Tip: Use the CSSCAN utility before character set migration

Oracle DBAs may try to import a full database export--only to get warnings about character set incompatibilities. See how you can avoid these problems by using the Character Set Scanner (CSSCAN) utility.

This article originally appeared in the Oracle e-newsletter. Click here to subscribe automatically.

Oracle DBAs may try to import a full database export—only to get warnings about character set incompatibilities. This problem is compounded because the default database character set is ASCII, yet the recommended character set for international databases is one of the Unicode character sets, such as AL32UTF8.

It's easy to build a database and start development before realizing that the database character set doesn't support non-ASCII data. Developers can avoid these problems by using the Character Set Scanner (CSSCAN) utility, which is automatically installed in Oracle 9i. If you're using other database versions, you can download CSSCAN from Oracle's Technology Network.

Before running this utility, you need to install a set of tables and procedures the CSMIG user owns. The script, csminst.sql, is either in the rdbms/admin directory under Oracle Home or in the downloadable distribution. You must run this script using a DBA account (logged in with SYSDBA on Oracle9i).

Run the CSSCAN utility against the database using a DBA account by entering CSSCAN <username>/<password> at the command prompt. The program is similar to EXP in the way it prompts for parameters. You can enter more parameters on the command line such as:

CSSCAN SYSTEM/MANAGER FULL=y TOCHAR=AL32UTF8 ARRAY=102400 PROCESS=3

To see the full list of parameters, enter CSSCAN HELP=Y. (Make sure you have the identifier for the character set to which you'll be migrating, such as AL32UTF8 or UTF8 for Oracle 8.) After running the utility, there should be three files created in your current directory: scan.err, scan.out, and scan.txt. These files contain the scan's results.

For example, I inserted chr(169) into a text column in a UTF-8 database. This character isn't a valid UTF-8 sequence; it's a common character that a Web user might insert. The output from CSSCAN in scan.err says:

User  : SCOTT
Table : TMPTABLE
Column: TEXT
Type  : VARCHAR2(2000)
Number of Exceptions         : 1
Max Post Conversion Data Size: 1

ROWID                     Exception Type               Size Cell Data(first 30 bytes)
—————————————————————      ———————————                  ———————————————
AAAHboAABAAAMYyAAA     lossy conversion                        ?
—————————————————————      ———————————                  ———————————————

Once you correct these kinds of problems reported by the utility, you can migrate the data to a new database, with the target character set using migration utilities such as dbassist.

Scott Stephens worked for Oracle for more than 13 years in technical support, e-commerce, marketing, and software development.

Editor's Picks

Free Newsletters, In your Inbox