Protocol Buffer - A Walk Through For Beginners

This article would introduce you the third option when it comes to data serialization. Let's move beyond XML and JSON for better. Protocol Buffer is a language-agnostic binary data format developed by Google to serialize the structured data between different services.

Well, you might haven’t heard of the term Google Protobuf or more commonly known as Protocol Buffer and wondering if it’s a buzzword or something. But trust me, you are not the only one to think like that. Even I hadn’t heard about Protobuf up until a friend made me realize what I was missing onto. Spreading the light, let me share what did I learn about Protocol Buffer.

Protocol Buffer is a language-agnostic binary data format developed by Google to serialize the structured data between different services. Now, if you didn’t get all those heavy terms at first it’s fine. Allow me to walk you through it till you’re able to digest it. So, for this article, we’ll be talking about.

  • What Protocol Buffer actually is?
  • Why Protocol Buffer?
  • How do they work?
  • General Structure
  • Demo

Protocol Buffer

To fully get the concept of Protobuf, we first need to understand what is serialization and what are the problems we had that needed to be solved. As Wikipedia explains serialization,

“Serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment)”.

Which in simple words mean that we need to serialize data if we are to store it or transfer. But what should be the format for data serialization? Here are a few fixes for this.

  1. The raw in-memory data structures can be sent or saved in binary form. But what If I want to retrieve that data over some other memory layout? A straightforward No! That’s not how we play. The code must be compiled with the same memory layout and endianness, etc. Also, it’s really hard to extend this format.

  2. Serialize the data to XML (Extensible Markup Language) or JSON (JavaScript object notation). This approach is great for the front-end since it is human readable but what about backend server to server communication? These formats are notorious for being space intensive. Also encoding and decoding can impose a great performance penalty on the application.

So what now? Yeah, you can sure go ahead and invent your own serialization technique. This will give you flexibility but not a good idea when you have a tremendous amount of data. You need specified protocols to handle huge traffic. That’s exactly where Protobuf appears in the picture. They are like XML but more flexible, efficient, automated, smaller, faster and simpler. You need to define the structure of data once and then a specially generated source code is used to read and write the structured data to and from a variety of data streams and languages.

Why Protobuf?

I pretty much laid the foundation for why we should be already considering Protocol Buffer but if that’s not convincing enough, let’s discuss some more to make the usage of it reasonable.

There are usually two types of considerations associated with data storage and transmission.

  1. Size
  2. Efficiency

XML and JSON are designed to be human readable and self-describing which means they are to be text-based. Now, this comes with a cost as you need to encode data to transport the message and then, decode on another end. Thus, it increases the message size because some schema information needs to be included along with the message for it to make sense.

Protocol Buffers, on the other hand, are not self-describing instead, they work through binary serialization which means that they encode and compress the data to the binary stream which is extremely lightweight and easy to transfer. Reportedly, they take almost 1/3rd of the size of XML and 1/2 size of JSON if compared. Also, a smaller message requires less time for it to be transferred which contribute to efficiency. Protobuf is reportedly 6 times faster than JSON.

In addition to the above-mentioned advantages, here are a few more.

  • Validation
  • Easily extensible
  • Guaranteed type-safety
  • Backward compatibility
  • Language independence
  • Faster serialization/ deserialization

Why now?

Now, you might be wondering why you haven’t heard of it before and if so, why now that we’re talking about them? Here’s an extra sweet for you.

Yep! Although they’ve been around for ten years most people do not know about them is because they were first used by Google ‘internally’. And why now is because of the “Pokemon Go” game.

Pokemon Go Game 

It is created by Niantic and uses Protobufs for data transfer. It was the success of this game that gave the hype to the use of Protobufs publically.

Any Cons?

Well, I wouldn’t really call these cons but there might be situations when Protobufs might not be that helpful. As we already talked they are not targeting human readability. So if you want your data to be human readable then Protobufs are not a good fit. If your browser is directly consuming the data from service then it might be a better option to choose other serialization techniques. Also, people tend to use XML or JSON more often because they have a good community support whereas Protobufs lags behind there. Don’t expect a very detailed documentation neither do so many blog posts or article targeting development using Protobufs. But, this project is open source so you can sure go ahead and experiment with the things.

How Does Protobuf work?

You need to specify the structure of the data along with the services that you’re serializing by defining the message types in .proto file. Think of this message as a logical record of information in which you specify a message with values. That code then goes to the compiler which compiles it with protocol. A predetermined schema is used to encode and decode the message.

Protobuf work 

Now, that the working makes sense. Let me give you a very basic .proto message example to elaborate the structure.

  1. message Movie {  
  2.    required string title = 1;  
  3.    required string genre = 2;  
  4. }  

What we did here is defined the context in the message name Movie which has two fields title and genre with the identifier as 1 and 2. You can specify the fields as optional, required, and repeated. Keep in mind that this is the string representation of what actually would be done in binary.

Demo

So, I’m believing that now you can digest the heavy terms so enough with the talk let’s get our hands dirty. 

Note
I'm going to assume your basic familiarity with C++ and how the accessors are used. 

Environment Setup

Note
Although Protobuf supports almost all languages and targets all the major platforms, i.e., Linux and Windows, I’ll be covering only the C++ installation for Windows. If you’re interested in working with some other language or platform, then access the Protobufs documentation here.

In order to build Protobuf with MSVC on Windows, you need the following tools.

Go ahead and installed the above tools to follow along.

Once you have everything installed, open Visual Studio command prompt and navigate to your working directory. Once there, execute the following command.

  1. $ mkdir install  

This is just going to create a folder where Protobufs will be installed after the build. Before going ahead, make sure CMake and Git are added to the system path variable. If not, they can be added to the PATH by executing the following commands.

  1. $ set PATH=%PATH%;C:\Program Files (x86)\CMake\bin    
  2.   
  3. $ set PATH=%PATH%;C:\Program Files\Git\cmd    

Now, you’re good to clone the Protobuf locally.

  1. $ git clone https://github.com/protocolbuffers/protobuf.git  

If you choose not to use git, then you can simply download the package from Git repository that exists at here.

Once you have the repo locally, navigate to the project folder Protobuf, and then to the cmake folder.

  1. $ cd protobuf    
  2.   
  3. $ cd cmake    

Now, we need to configure the CMake. For that, follow along executing the following commands.

  1. $ mkdir build & cd build  

You need to update any submodules if you are using the Git clone.

  1. $ git submodule update --init --recursive  

Makefile generator can build the project in only one configuration. So, a separate folder is required for each configuration.

For Release configuration:

  1. $ mkdir release & cd release    
  2.   
  3. $ cmake -G "NMake Makefiles" ^    
  4.  -DCMAKE_BUILD_TYPE=Release ^    
  5.  -DCMAKE_INSTALL_PREFIX=../../../../install ^    
  6.  ../..   

For Debug configuration:

  1. $ mkdir debug & cd debug    
  2.   
  3. $ cmake -G "NMake Makefiles" ^    
  4.  -DCMAKE_BUILD_TYPE=Debug ^    
  5.  -DCMAKE_INSTALL_PREFIX=../../../../install ^    
  6.  ../..    

Any of the above command will generate nmake Makefile in the current directory. After this, you’re good to turn to Visual Studio. Navigate back to build folder and execute the following command. Remember to specify the Visual Studio version that you are using. I am using VS 2017 community edition.

  1. $ mkdir solution & cd solution    
  2.   
  3. $ cmake -G "Visual Studio 15 2017 Win64" ^    
  4.  -DCMAKE_INSTALL_PREFIX=../../../../install ^    
  5.  ../..    

Time to compile the Protobuf. Remember the configuration you specified earlier and choose accordingly at this stage as well. Navigate to the build/release folder and execute.

  1. $ nmake  

Once compiled, you can run the unit tests as:

  1. $ nmake check  

If all the tests are passed, do the installation:

  1. $ nmake install  

Now we are good to create our project. Create a new folder in which you want to keep your project. In order to use the protobufs you first need to define the .proto file. Let’s take the example of a simple Project management system. Use any text editor of your choice and create a file projectmanagement.proto. Remember the extension to the file name .proto

Add the following to the file.

  1. //projectmanagement.proto    
  2.   
  3. package projectmanagement;  
  4.     
  5. message Developer {    
  6.     required string first_name = 1;    
  7.     required string last_name = 2;    
  8.     required string email = 3;    
  9. }    
  10.   
  11. message Project {    
  12.     required string title = 1;    
  13.     optional string url = 2;    
  14.     repeated Developer developer = 3;    
  15. }    

The above code specifies two messages along with fields. This should be familiar to you now as we have already talked about the structure of the message.

You need protoc compiler to compile the above code. The release/debugfolder contains the protoc.exefile which was generated while performing configurations. Now, either you can add that file to the system PATH or just copy it to the current working directory and execute the following command to compile the code.

  1. $ protoc --cpp_out=. projectmanagement.proto  

Once the command is successfully executed, you’ll see that there are two files generated as:

projectmanagement.pb.h
projectmanagement.pb.cc

Let’s look at some of the generated code in header file. If you scroll down, you’ll see the accessors defined for you.

  1. accessors -------------------------------------------------------   
  2.    
  3. // required string first_name = 1;    
  4. bool has_first_name() const;    
  5. void clear_first_name();    
  6. static const int kFirstNameFieldNumber = 1;    
  7. const ::std::string& first_name() const;    
  8. void set_first_name(const ::std::string& value);    
  9. #if LANG_CXX11    
  10. void set_first_name(::std::string&& value);    
  11. #endif    
  12. void set_first_name(const char* value);    
  13. void set_first_name(const char* value, size_t size);    
  14. ::std::string* mutable_first_name();    
  15. ::std::string* release_first_name();    
  16. void set_allocated_first_name(::std::string* first_name);   
  17.    
  18. // required string last_name = 2;    
  19. bool has_last_name() const;    
  20. void clear_last_name();    
  21. static const int kLastNameFieldNumber = 2;    
  22. const ::std::string& last_name() const;    
  23. void set_last_name(const ::std::string& value);    
  24. #if LANG_CXX11    
  25. void set_last_name(::std::string&& value);    
  26. #endif    
  27. void set_last_name(const char* value);    
  28. void set_last_name(const char* value, size_t size);    
  29. ::std::string* mutable_last_name();    
  30. ::std::string* release_last_name();    
  31. void set_allocated_last_name(::std::string* last_name);    
  32.   
  33. // required string email = 3;    
  34. bool has_email() const;    
  35. void clear_email();    
  36. static const int kEmailFieldNumber = 3;    
  37. const ::std::string& email() const;    
  38. void set_email(const ::std::string& value);    
  39. #if LANG_CXX11    
  40. void set_email(::std::string&& value);    
  41. #endif    
  42. void set_email(const char* value);    
  43. void set_email(const char* value, size_t size);    
  44. ::std::string* mutable_email();    
  45. ::std::string* release_email();    
  46. void set_allocated_email(::std::string* email);  

That’s the same for our second message too. This is another great feature of Protobufs. You can easily use these setters and getters just like you do in routine. Below is the little program just to elaborate the ease protobufs provide when it comes to accessors.

  1. //protobuf_sample.cc  

  2. #include < iostream >   
  3. #include < fstream >   
  4. #include "projectmanagement.pb.h"  

  5. using namespace std;  

  6. int main() {  
  7.     projectmanagement::Project project; 
  8.  
  9.     project.set_name("Sample");  
  10.     project.set_url("http://www.sample.com");
  11.   
  12.     projectmanagement::Developer * developer = project.add_developer();  
  13.     developer - > set_first_name("ABC");  
  14.     developer - > set_last_name("XYZ");  
  15.     developer - > set_email("someone@example.com");
  16.   
  17.     cout << "Project: " << project.name() << endl;  
  18.     cout << "URL: " << (company.has_url() ? company.url() : "N/A") << endl;  

  19.     cout << "Developers: "  << endl;  
  20.     cout << "First name: " << developer.first_name() << endl;  
  21.     cout << "Last name: " << developer.last_name() << endl;  
  22.     cout << "Email: " << developer.email() << endl;  

  23.     return 0;  
  24. }  

The above code is self-explanatory. We are just assigning some values to the fields using setters and then getting the output.

  1. // output:  
  2. // Project: Sample  
  3. // URL: http://www.sample.com  
  4. //  
  5. // developers:  
  6. // First name: ABC  
  7. // Last name: XYZ  
  8. // Email: someone@example.com  

Last Words

This is not it with Protobufs. You can encode the above data to binary or dump data back from binary to human readable text format which is pretty sweet. I would highly encourage you to experiment along and see how things turn out to be. Also, Protobuf’s documentation contains the tutorials for other languages as well. You can access it here. Go ahead and get the taste.