Cybersecurity of External Streaming Data - Integrity

Introduction

The external data is recognized as the data we must pull or push from outside of a boundary of the process hosting the computer program. In general, the external data may be grouped as follows:

  • streaming: bitstreams are managed using the content of files or network payload.
  • structural: data fetched/pushed from/to external database management systems using queries.
  • graphical: data rendered on a Graphical User Interface (GUI).

This article covers selected topics related to the cybersecurity of streaming data. For this kind of data, bitstreams are the most popular because they are used as file content and for network communication (see also External Data Management (ExDM)).

Cybersecurity describes the practice of protecting computer systems, networks, and data from cyber threats. Cybersecurity is sometimes spelled "cyber security" but there are no major differences in the meaning. Some sources may prefer one spelling over the other. Still, both terms are widely accepted and used interchangeably. In this article, only cybersecurity related to bitstreams is considered. Precisely we will talk about using cryptography algorithms to protect the security of bitstreams.

First, let me remind you of the issues we must address in this respect. We have three of them. The first one is to ensure that all users of a bitstream can verify that it was not modified while being archived or transmitted. It is the main topic of this article.

The second goal is to ensure bitstream confidentiality. Confidentiality in the context of bitstream cybersecurity refers to the protection of sensitive information from unauthorized access or disclosure. Let me stress that it doesn't mean selective availability of the bitstream itself. The main goal of the confidentiality implementation is that only authorized individuals or systems can access information associated with the bitstream. The third goal is the non-repudiation of the author. These two goals will be covered in the following articles in this cycle. Follow me to get the full story related to confidentiality and non-repudiation implementation in practice.

Now let's talk about securing bitstreams using cryptography algorithms. Cryptography is a broad concept, but we will focus only on selected, very practical aspects related to the security of bitstreams implementation. Discussion of the cryptography algorithms is out of this article's scope.

We already know how to create bitstreams. We can also attach coding to them, i.e. the natural language alphabet and recognizing the bitstream as text. The next step is to assign syntax and semantics that allow the streams to be transformed into a coherent document that allows the association of information with these documents by a computer user. If this is not enough, we can also display these documents in graphical form.

However, from the security point of view, the most important thing is that always we must deal with bitstreams as a sequence of bits that can be sent, archived, and processed by another computer. It must be stressed again that this infrastructure is always binary. Well, this is where the problem arises. It is required that this binary document is protected against malicious operations. For example, if this document contains a transfer order to our bank, the problem becomes real, material, and meaningful in this context.

If we are talking about archiving bitstreams or sending bitstreams from one system to another, from one computer to another, the first thing we need to take care of is the integrity of such a stream. This means that from the point it is created until it is at its final destination, where it is to be processed, it is not modified. The best way to accomplish this is by using the hash function.

Hash Function

Let's move on to the first option for securing streams: the hash function. It is a function that transforms the input bitstream to calculate another fixed-size unique bitstream. A collision in a hash function occurs when two different inputs produce the same hash value as the output. The next feature of the received output bitstream is that the reverse transformation, i.e. recovering the source bitstream from that hash value is practically impossible.

One way to use such a function is to associate this hash value with the bitstream we want to protect. Then the hash value can be used to check whether the bitstream has not been modified in the meantime by calculating this function again and comparing the result with the associated hash value with the source bitstream if the expanded bitstream is archived or sent from one place to another. A certain drawback of this solution is that the algorithms for these functions are widely known, so if a "man in the middle" wants to modify the source bitstream, they can modify the source bitstream and recalculate a new value of the hash function for the previously modified bitstream.

Anyway, there are a few scenarios where this approach makes sense. Well, for example, the value of the hash function may be entered into the next bitstream called block, and a chain protection is created. The next block, which is also a bitstream, containing this hash value and pointing to the previous block means that we cannot modify the previous block because the value of the hash function is stored in the next one. This type of chain security is called blockchain and is used widely to protect against crypto-currencies double-spending, for example, Bitcoin (fig below).

Hash function

Blockchain security helps ensure that if someone wants to modify one of the blocks in the chain, they must modify all the blocks that have been attached to that chain later. Of course, this is still possible, so further safeguards are needed. Among other things, the growth rate of this chain can be applied, namely, the speed of adding subsequent blocks to the chain is greater than the possibility of modifying fragments of the chain. This topic is far beyond the scope of this document, but if you are interested in getting more I encourage you to check out a dedicated GitHub repository NBlockchain. There is a practical example of how to implement such a chain.

Hash Function Applicability Example

So let's see how the hash function can be used in practice. In the CryptographyHelpersUnitTest class, two unit tests are contained. They use the CalculateSHA256 method defined in the library.

    public static (string Hex, string Base64) CalculateSHA256(this string inputStream)
    {
      byte[] _inputStreamBytes = Encoding.UTF8.GetBytes(inputStream);
      using (SHA256 mySHA256 = SHA256Managed.Create())
      {
        byte[] hashValue = mySHA256.ComputeHash(_inputStreamBytes);
        return (BitConverter.ToString(hashValue), Convert.ToBase64String(hashValue, Base64FormattingOptions.InsertLineBreaks));
      }
    }

It is worth emphasizing once again that the argument of a hash function is always a bitstream. But obviously, the hash function may also be used for text, namely a bitstream for which an encoding has been defined. In the CalculateSHA256Test method, we have to protect a password. It is a string of random characters. Password may be associated with syntax and semantics to make it easier to remember but, fortunately, these syntax and semantics rules have no impact on the hash calculation. In this method, instead of a bitstream, we have a stream of characters compliant with the string type. The Alt+F12 key will take us to the definition of the CalculateSHA256 method. The input parameter of this method is a sequence of characters of the string type, but the hash function operates on an array of bytes, therefore we must transform this string of characters into an array of bytes. To do this, we need to associate an encoding. In the case of the method under consideration, this is UTF8. This is the first yellow warning light that should be turned on because everyone who will use the result of the hash function to check the correctness of the input string must use the same encoding (UTF8 in this case). If someone uses a different encoding, the hash function cannot necessarily be used to check the consistency of the input text.

To be able to calculate the hash function in the CalculateSHA256 method, we need to create an object of the SHA256Managed class available in the language library. Since it implements IDisposable, I used the using statement.

In the next line,


return (BitConverter.ToString(hashValue), Convert.ToBase64String(hashValue, Base64FormattingOptions.InsertLineBreaks));

a bitstream generated by the hash function is converted into two text forms. The BitConverter.ToString converts the numeric value of each element of a specified array of bytes to its equivalent hexadecimal representation. The second form is a string with a notation consistent with the hexadecimal code compliant with the Base64 standard.

Base64 is a binary-to-text conversion. The output of this conversion is a bitstream using the ASCII encoding. Created this way text is commonly used in scenarios where binary data needs to be stored or transferred as text. This conversion method Base64 has been implemented in the language library. All implementations are compliant with the RFC 2045 standard. And here another yellow warning light should be turned on because it is not the only standard that defines the Base64 conversion. Moreover, based on the RFC database, it is easy to conclude that several RFC documents previously defined this conversion. So we can expect that this standard has been modified over time. Therefore, the question is about backward compatibility and the lifetime length of the calculated hash value if it is saved as text compliant with the Base64 standard. It may turn out that the input string has not changed, but in the meantime, the implementation of the Base64 conversion has changed and therefore using this string for validation is useless.

In unit test methods, we have two assertions, which compare the result returned by hash calculation methods with defined hard-coded text. If the encoding changes when converting the input string of characters and when the implementation of the conversion to hexadecimal text Base64 changes, we can expect that these assertions and invariants will not be true and the test will end with an error. We also have to consider this as another yellow warning light that should be turned on. In other words, the use of a string to protect hash value, although convenient, unfortunately, has the consequence that this conversion from a bitstream to text compliant with the string type does not always have to be the same and may change over time. So why use it; someone may ask. In that case, wouldn't it be better for us to base it on a sequence of bytes? Well, we cannot always attach such a sequence of bits to the text; if it is e-mail, for example, then the email system has strictly defined characters that it can use to control data flow. Hence, it has to be taken into consideration the fact that attaching such a raw bitstream could have invalid characters causing problems with the correct operation of the email system. Therefore, conversion to text is sometimes necessary, but you need to remember these caveats.

Conclusion

The integrity of a bitstream means that from the point it is created until it is at its final destination, where it is to be processed, it is not modified at all. In other words, anu modification is recognized as an error.

The best way to accomplish the bitstream integrity is by using the hash function. The hash function is a kind of transformation of the input bitstream to another fixed-size unique bitstream. A collision in a hash function occurs when two different inputs produce the same hash value as the output. The next feature of the received output bitstream is that the reverse transformation, i.e. recovering the source bitstream from that hash value is practically impossible.

Using the hash function to protect the integrity of text documents the unimportant changes must be considered. Unimportant modifications of a text document typically refer to changes that do not affect the associated meaning of the document. These could include for example adding/removing whitespace characters or comments. Let me stress that the hash function is used to protect the integrity of bitstream but not to assess equivalent documents.


Similar Articles