Protobuf: Structured Code with Protocol Buffers
The structuring of data plays an important role in the development of programs and websites. If project data is well structured, for example, it can be easily and precisely read by other software. On the Internet, this is especially important for text-based search engines such as Google, Bing or Yahoo, which can capture the content of a website thanks to corresponding, structured distinctions.
The use of structured data in software development is generally worthwhile - whether for Internet or desktop applications - wherever programs or services have to exchange data via interfaces and a high data processing speed is desired. You will learn the role the serialisation format Protocol Buffers (Protobuf) can play and how this structuring method differs from the known alternative JSONP in this article.
What is Protobuf (Protocol Buffers)?
Protocol Buffers, or Protobuf for short, a data interchange format originally developed for internal use, has been offered to the general public as an open source project (partly Apache 2.0 license) by Google since 2008. The binary format enables applications to store as well as exchange structured data in an uncomplicated way, whereby these programs can even be written in different programming languages. The following, including others, are supported languages:
- C#
- C++
- Go
- Objective-C
- Java
- Python
- Ruby
Protobuf is used in combination with HTTP and RPCs (Remote Procedure Calls) for local and remote client-server communication - to describe the interfaces required here. The protocol composition is also called gRPC.
What are the benefits of Google’s Protocol Buffers?
When developing Protobuf, Google placed emphasis on two factors: Simplicity and performance. At the time of development, the format - as already mentioned, initially used internally at Google - was to replace the similar XML format. Today it is also in competition with other solutions such as JSON(P) or FlatBuffers. As Protocol Buffers are still the better choice for many projects, an analysis makes the characteristics and strengths of this structuring method clear:
Clear, cross-application schemes
The basis of every successful application is a well-organised database system. A great deal of attention is paid to the organisation of this system - including the data it contains - but the underlying structures are then lost at the latest when the data is forwarded to a third-party service. The unique encoding of the data in the Protocol Buffers schema ensures that your project forwards structured data as desired, without these structures being broken up.
Backward and forward compatibility
The implementation of Protobuf spares the annoying execution of version checks, which is usually associated with "ugly" code. In order to maintain backward compatibility with older versions or forward compatibility with new versions, Protocol Buffers uses numbered fields that serve as reference points for accessing services. This means you do not always have to adapt the entire code in order to publish new features and functions.
Flexibility and comfort
With Protobuf coding, you automatically use modifiers (optional: required, optional or repeated) which simplify the programming work considerably. This way the structuring method allows you to determine data structure at scheme level, whereupon the implementation details of the classes used for the different programming languages are automatically regulated. You can also change the status at any time, for example from "required" to "optional". The transport of data structures can also be regulated using Protocol Buffers: Through the coding of generic query and response structures, a flexible and secure data transfer between multiple services is ensured in a simple manner.
Less boilerplate code
Boilerplate code (or simply boilerplate) plays a decisive role in programming, depending on the type and complexity of a project. Put simply, it is reusable code blocks that are needed in many places in software and are usually only slightly customisable. Such code is often used, for example, to prepare the use of functions from libraries. Boilerplates are common in the web languages JavaScript, PHP, HTML and CSS in particular, although this is not optimal for the performance of the web application. A suitable Protocol Buffers scheme helps to reduce the boilerplate code and thereby improve performance in the long term.
Easy language interoperability
It is part of today's standard, that applications are no longer simply written in one language, but that program parts or modules combine different language types. Protobuf simplifies interaction between the individual code components considerably. If new components are added whose language differs from the current project language, you can simply translate the Protocol Buffers scheme into the respective target language using the appropriate code generator, whereby your own effort is reduced to a minimum. The prerequisite is, of course, that the languages used are those supported by Protobuf by default, such as the languages already listed, or via a third-party add-on.
Protobuf vs. JSON: The two formats in comparison
First and foremost, Google developed Protocol Buffers as an alternative to XML (Extensible Markup Language) and exceeded the markup language in many ways. Therefore structuring the data with Protobuf not only tends to be simpler, but according to the search engine giant, also ensures a data structure that is between three to ten times smaller and 20 to 100 times faster than a comparable XML structure.
Also, with the JavaScript markup language JSON (JavaScript Object Notation), Protocol Buffers often makes a direct comparison, whereby it should be mentioned that both technologies were designed with different objectives: JSON is a message format which originated from JavaScript, which exchanges its messages in text format and is supported by practically all common programming languages. The functionality of Protobuf includes more than one message format, as Google technology also offers various rules and tools for defining and exchanging messages. Protobuf also generally outperforms JSON when you look at the sending of messages in general, but the following tabular “Protobuf vs. JSON” list shows that both structuring techniques have their advantages and disadvantages:
Protobuf | JSON | |
---|---|---|
Developer | Douglas Crockford | |
Function | Markup format for structured data (storage and transmission) and library | Markup format for structured data (storage and transmission) |
Binary format | Yes | No |
Standardisation | No | Yes |
Human-readable format | Partially | Yes |
Community/Documentation | Small community, expandable online manuals | Huge community, good official documentation as well as various online tutorials etc. |
So, if you need a well-documented serialisation format that stores and transmits the structured data in human-readable form, you should use JSON instead of Protocol Buffers. This is especially true if the server-side part of the application is written in JavaScript and if a large part of the data is processed directly by browsers by default. On the other hand, if flexibility and performance of the data structure play a decisive role, Protocol Buffers tends to be the more efficient and better solution.
Tutorial: Practical introduction to Protobuf using the example of Java
Protocol Buffers can make the difference in many software projects, but as is often the case, the first thing to do is get to know the particularities and syntactic tricks of the serialisation technology and how to apply them. To give you an initial impression of Protobuf's syntax and message exchange, the following tutorial explains the basic steps with Protobuf - from defining your own format in a .proto file, to compiling the Protocol Buffers structures. A simple Java address book application example will be used as a code base that can read contact information from a file and write to a file. The parameters "Name", "ID", "email address" and "Telephone number" are assigned to each address book entry.
Define your own data format in the .proto file
You first describe any data structure that you want to implement with Protocol Buffers in the .proto file, the default configuration file of the serialisation format. For each structure that you want to serialise in this file - that is, map in succession - simply add a message. Then you specify names and types for each field of this message and append the desired modifier(s). One modifier is required per field.
One possible mapping of the data structures in the .proto file looks as follows for the Java address book:
syntax = "proto3";
package tutorial;
option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phones = 4;
}
message AddressBook {
repeated Person people = 1;
}
The syntax of Protocol Buffers is therefore strongly reminiscent of C++ or Java. The Protobuf version is always declared first (here proto3), followed by the description of the software package whose data you want to structure. This includes a unique name ("tutorial“) and, in this code example, the two Java-specific options "java_package"(Java package in which the generated classes are saved) and "java_outer_classname“ (defines the class name under which the classes are summarised).
This is followed by the Protobuf messages, which can be composed of any number of fields, whereby the typical data types such as "bool", "int32", "float", "double", or "string" are available. Some of these are also used in the example. As already mentioned, each field of a message must be assigned at least one modifier - i.e. either...
- required: a value for the field is mandatory. If this value is missing, the message remains "uninitialised", i.e. not initialized or unsent.
- optional: a value can be provided in an optional field but does not have to. If this is not the case, a value defined as the standard is used. In the code above, for example, the default value "HOME" (landline number at home) is entered for the telephone number type.
- repeated: fields with the “repeated” modifier can be repeated any number of times (including zero times).
You can find detailed instructions on how to define your own data format with Protocol Buffers in the Google Developer Forum.
Compile your own Protocol Buffers schema
If your own data structures are defined as desired in the .proto file, generate the classes needed to read and write the Protobuf messages. To do this, use the Protocol Buffers Compiler (protoc) on the configuration file. If you have not yet installed it, simply download the current version from the official GitHub-Repository. Unzip the ZIP file at the desired location and then start the compiler with a double click (located in the "bin" folder).
Make sure you have the appropriate edition the Protobuf compiler: Protoc is available for 32- or 64-bit architectures (Windows, Linux or macOS), as desired.
Finally, you specify:
- the source directory which contains the code of your program (here placeholder "SRC_DIR"),
- the destination directory in which the generated code is to be stored (here placeholder "DST_DIR")
- and the path to the .proto file.
As you want to generate Java classes, you also use the --java_out option (similar options are also available for the other supported languages). The complete compile command is as follows:
protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto
A more detailed Protobuf Java tutorial, which explains, among other things, the transmission of messages via Protocol Buffers (read/write), is offered by Google in the “Developers” section, the in-house project area of the search engine giant for developers. Alternatively, you also have access there to instructions for the other supported languages such as C++, Go or Python.