What are CLOBs (Character Large Objects)?
Normally, databases save data as follows: Individual table columns are compiled as rows, which are then stacked as data blocks. In each data block, a certain piece of information is then assigned to a row. Saving data blocks in this way requires less storage space in the database.
But particularly large data objects are treated differently by databases. These Large Objects (LOBs) are much larger than conventional database entries, as well as unstructured, and are therefore usually stored in a separate storage location. The database then simply creates a reference to the storage location of the object at the corresponding position.
Two types of LOBs exist: BLOBs and CLOBs. A BLOB is a data type that saves large binary objects, while the CLOB file format stores long character strings. The abbreviation stands for Character Large Objects and the term was coined by the developers of the Oracle database. In other database systems, there are also other designations for large objects, such as TEXT in MySQL and PostgreSQL.
How Do CLOBs Work and What Are Their Applications?
Character Large Objects encompass all database objects with strings – i.e. all objects that contain files consisting of characters. Character strings are sequences of letters, digits, special characters, and control characters; CLOB entries therefore typically reference text documents (particularly XML). What’s special about storing these large data types is that they are not contained directly in the database – with few exceptions – but in a separate place. The database itself only contains a reference mechanism that points to the respective CLOB. For instance, the entire character code of a website is not stored in a database cell, but instead a reference number that links to the actual storage location of the underlying HTML document.
Pros and Cons of CLOB Storage
One of the biggest advantages of the CLOB data type is the ability to read and edit individual files. For example, database administrators can use application programmes to compare, edit or merge the content of input Character Large Objects. A brief overview of the editing options is shown below:
- SUBSTR allows users to extract strings from a CLOB
- INSTR inserts cut strings into another position or other CLOBs
- COMPARE compares the values of two CLOBs
- APPEND connects two CLOBs with each other
Since CLOBs store the documents with text or characters in full, it’s possible to examine their content in detail. Administrators can determine exactly whether content is duplicated or text parts overlap. If this is the case, the datasets can be combined to free up unnecessarily occupied storage space.
In addition, storing letters, digits, special characters, and control characters as CLOBs is absolutely loss-free: When saving files, it often depends on whether all information within the document actually has to be saved – that isn’t an issue with this data format. For instance, you can also save the meta information of a document together with the actual content as a CLOB. Most databases therefore even support the presentation of Character Large Objects in a tree structure as well as navigation and searching for specific content.
The disadvantages of this data type for storing enormously large numbers of characters include the fact that some databases do not allow editing with familiar SQL functions. CLOBs store extremely large quantities of text, which means the execution of standard functions would take a very long time. However, at least the application commands mentioned above – SUBSTR, INSTR, COMPARE, and APPEND – are possible alternatives for replicating unavailable SQL operations. Another disadvantage of CLOBs is that relatively small data elements waste valuable storage space, since a defined, segmented storage space is reserved outside the database.