Character String Syntax-Datatypes

The syntax for the four basic string types is straightforward, as described in Table 12-1.
Table 12-1.  Four Basic String Types

Bytes or Characters

The VARCHAR2 and CHAR types support two methods of specifying lengths:

•\ In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multibyte character set. Remember that bytes are not the same as characters in a multibyte character set!

•\ In characters: VARCHAR2(10 char). This will support up to ten characters of data, which could be as much as 40 bytes of information. Furthermore, VARCHAR2(4000 CHAR) would theoretically support up to 4000 characters of data, but since a character string datatype in Oracle is limited to 4000 bytes, you might not be able to store that many characters. See the following for an example.

When using a multibyte character set such as UTF8, you would be well advised to use the CHAR modifier in the VARCHAR2/CHAR definition—that is, use VARCHAR2(80 CHAR), not VARCHAR2(80), since your intention is likely to define a column that can in fact store 80 characters of data. You may also use the session or system parameter NLS_LENGTH_ SEMANTICS to change the default behavior from BYTE to CHAR. I do not recommend changing this setting at the system level; rather, use it as part of an ALTER SESSION setting in your database schema installation scripts.

Any application that requires a database to have a specific set of NLS settings makes for an unfriendly application. Such applications generally cannot be installed into a database with other applications that do not desire these settings, but rely on the defaults to be in place.

One other important thing to remember is that the upper bound of the number of bytes stored in a VARCHAR2 is 4000. However, even if you specify VARCHAR2(4000 CHAR), you may not be able to fit 4000 characters into that field. In fact, you may be able to fit as few as 1000 characters in that field if all of the characters take 4 bytes to be represented in your chosen character set! Regarding the 4000-byte limit, starting with 12c, a VARCHAR2 can be configured to store up to 32,767 bytes of information.

The following small example demonstrates the differences between BYTE and CHAR and how the upper bounds come into play. We’ll create a table with three columns, the first two of which will be 1 byte and 1 character, respectively, with the last column being 4000 characters. Notice that we’re performing this test on a multibyte character set database using the character set AL32UTF8, which supports the latest version of the Unicode standard and encodes characters in a variable-length fashion using from 1 to 4 bytes for each character:
$ sqlplus eoda/foo@PDB1
SQL> select * from nls_database_parameters where parameter = ‘NLS_ CHARACTERSET’;
PARAMETER VALUENLS_CHARACTERSET AL32UTF8
SQL> create table t( a varchar2(1),b varchar2(1 char),c varchar2(4000 char));Table created.

Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we observe the following:
SQL> insert into t (a) values (unistr(‘\00d6’)); insert into t (a) values (unistr(‘\00d6’))

ERROR at line 1:
ORA-12899: value too large for column “EODA”.”T”.”A” (actual: 2, maximum: 1)
This example demonstrates two things:

•\ VARCHAR2(1) is in bytes, not characters. We have a single Unicode character, but it won’t fit into a single byte.

•\ As you migrate an application from a single-byte fixed-width character set to a multibyte character set, you might find that the text that once fit into your fields no longer does.

The reason for the second point is that a 20-character string in a single-byte character set is 20 bytes long and will absolutely fit into a VARCHAR2(20). However, a 20-­character field could be as long as 80 bytes in a multibyte character set, and 20 Unicode characters may well not fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to create your tables.

If we insert that single character into a field set up to hold a single character, we will observe the following:
SQL> insert into t (b) values (unistr(‘\00d6’)); 1 row created.
SQL> select length(b), lengthb(b), dump(b) dump from t; LENGTH(B) LENGTHB(B) DUMP

That INSERT succeeded, and we can see that the LENGTH of the inserted data is one character—all of the character string functions work character-wise. So the length of the field is one character, but the LENGTHB (length in bytes) function shows it takes 2 bytes of storage, and the DUMP function shows us exactly what those bytes are. So, that example demonstrates one very common issue people encounter when using multibyte character sets, namely, that a VARCHAR2(N) doesn’t necessarily hold N characters, but rather N bytes.

Note The next example works on databases that do not have extended datatypes enabled.
The next issue people confront frequently is that the maximum length in bytes of a
VARCHAR2 is 4000 and in a CHAR is 2000:
SQL> declare
l_data varchar2(4000 char);
l_ch varchar2(1 char) := unistr( ‘\00d6’ );begin

ERROR at line 1:
ORA-01461: can bind a LONG value only for insert into a LONG column
ORA-06512: at line 6

This shows that a 4000-character string that is really 8000 bytes long cannot be stored permanently in a VARCHAR2(4000 CHAR) field. It fits in the PL/SQL variable because there a VARCHAR2 is allowed to be up to 32KB in size. However, when it is stored in a table, the hard limit is 4000 bytes. We can store 2000 of these characters successfully:

SQL> declare
l_data varchar2(4000 char);
l_ch varchar2(1 char) := unistr( ‘\00d6’ );begin

And as you can see, they consume 4000 bytes of storage.

Leave a Reply

Your email address will not be published. Required fields are marked *.

*
*

BACK TO TOP