How do Document Stores work?

Databases are necessary for organising information in a practical manner. However, there are various ways for databases to be structured. In electronic data processing, relational databases are particularly common and widespread. Besides these, there are document-based databases. These are based on a simple table structure with documents for storing information. How do these databases work and what are their advantages?

What is a Document Store?

Document-oriented databases, also known as document stores, are used to manage semi-structured data. This data does not adhere to a fixed structure, instead it forms its own structure. The information can be ordered using markers within the semi-structured data. Due to the lack of a defined structure, this data is not suitable for relational databases since its information cannot be arranged in tables.

A document database creates a simple pair: A key is assigned to a specific document. The actual information is then located within this document, which may be formatted as an XML, JSON or YAML file. Since the document does not require a specific schema, different types of documents can also be integrated together in a document store. Changes to the documents do not have to be communicated to the database.

Note

Document-based databases are very similar to other database models: the system can be considered a subcategory of NoSQL databases and it’s closely related to key-value databases due to the combination of keys and documents. As a row-oriented system, it stands in contrast to column-oriented databases.

How Do Document Databases Work?

In theory, data in all sorts of formats, even without a consistent schema can be stored in a document-based database. In practice, however, a file format is typically used for the documents and the information is ordered in a certain structure. This makes it easier to work with the information and database. By using data structures, database search queries can be processed more effectively for example. You can generally perform the same actions in a document-based database as with a relational system: information can be added, changed, deleted, and queried.

To allow these actions to take place, each document is given a unique ID. How this identifier is constituted is not particularly important. Both a simple number series, or the complete pathway can be used to address the document. When searching for information, the documents themselves are checked. In other words, the data is pulled directly from the documents rather than from the columns within the database.

What Are the Pros and Cons of Document Stores?

In conventional relational databases, a field has to exist for each piece of information—and in every entry. If the information is not available, the cell is kept empty, but it must still exist. Document-oriented databases are much more flexible: the structure of individual documents does not have to be consistent. Even large volumes of unstructured data can be accommodated in the database.

Plus, it’s easier to integrate new information. While in the case of a relational database a new information criterion must be added to all datasets, the new information only needs to be included in just a few datasets in a document store. The additional content can be added to further documents, but it’s not required.

Moreover, with document stores the information is not distributed over multiple linked tables. Everything is contained in a single location, and this can result in better performance. However, this speed advantage is only realised in document databases so long as you don’t attempt to use relational elements: references don’t really suit the concept of document stores. If you do try to interlink the documents, the system will become highly complex and cumbersome. So, a relational database system is more advisable for highly networked data volumes.

The Most Well-Known Document Databases

Especially for the development of web apps, databases for documents are hugely important. Due to the increased need resulting from web development, numerous database management systems (DBMSs) have meanwhile been released on the market. The most well-known examples are outlined below:

  • BaseX: This open-source project uses Java and XML. BaseX is supplied with a graphical user interface.
  • CouchDB: The Apache Software Foundation released the open-source software CouchDB. The database management system is written in Erlang, uses JavaScript, and is utilized in Ubuntu and Facebook applications among others.
  • Elasticsearch: This search engine works based on a document-oriented database. JSON documents are used to this end.
  • eXist: The open-source DBMS runs on a Java virtual machine and can therefore be used regardless of the operating system. XML documents are primarily used.
  • MongoDB: MongoDB is by far the most widespread NoSQL database. The software is written in C++ and uses JSON-like documents.
  • SimpleDB: With SimpleDB (written in Erlang), Amazon developed its own DBMS for the company’s Cloud services. The provider charges a fee for use.
Was this article helpful?
Page top