Information Computation Mastery: Challenges, Concepts, Implementation

Mariusz Postol
1y
1.4k
0
1

Article

TP5.13Delta.zip

Before reading this article, I recommend you to read my previous article: PiP - External Streaming Data - Useful Concepts - Part 1

Introduction

Information computation means a process engaging a computer (a physical device) to process information as a series of actions or steps taken to achieve a particular result or help to fulfill a task. The main challenge is that information is abstract. Precisely speaking, it is a kind of knowledge that cannot be processed directly by any physical device. Using a variety of bit-based codes dedicated to modern binary machines, the information must be represented as data.

The external data is recognized as the data we must pull or push from outside of a boundary of the process hosting the computer program. In general, the external data may be grouped as follows.

streaming: bitstreams managed using the content of files, or network payload.
structural: data fetched/pushed from/to external database management systems using queries.
graphical: data rendered on a Graphical User Interface (GUI).

To use computers to automate information processing, we have to deal with bitstreams as the information representation. By design, bitstream management involves the organization, storage, retrieval, communication, and manipulation to ensure accuracy, security, and accessibility. It encompasses data collection, storage architecture, integration, and maintenance to support efficient analysis and decision-making.

To fulfill these requirements, the following concepts could be a real relief.

presentation: involves visualizing, which involves creating graphical representations of data to gain understanding and insight
validation: refers to the process of ensuring the accuracy and quality of source data before using
standard data format: refers to a consistent and agreed-upon structure for representing data
serialization: is the process of converting a data graph of objects into a bitstream
safeguarding: refers to the practice of protecting digital information throughout its entire lifecycle to prevent unauthorized access, corruption, or theft.

To make the long story manageable, the article will be divided. Part 1 covered a description of the following concepts presentation, validation, standardization. Part 2 focuses on the serialization, and safeguarding of data.

Serialization

We need bitstreams to be handled using files to make sure that the data can be persisted. Let's recall the most important applications, such as entering input data or storing output data using file systems. We also use various types of streaming devices to archive data, i.e. to save data forever. The temporary and intermediate data repository is another example. Data transfer between applications is another use case. It requires that data must be transferable. For example interoperability of a web server and a web browser. There is a virtual wire between them. The virtual wire is not an abstract interconnection but means that only bitstream can be transferred between them. There are many more examples but let's limit the discussion to the mentioned only because they are enough to justify the importance of this topic.

In the already mentioned use cases, data must be in the form of bitstream. Now we are ready to return to discussing issues directly related to streaming data. Previously, we discussed the mechanisms of managing streams, especially in the context of files. We also realized the differences between bitstreams, text, and documents. Now let's answer the question of how to create streaming data and how to use it. First, let's try to define the purpose of our missions and the limitations we must deal with.

The first problem is related to the inevitable necessity of dealing with two concepts, namely object data with the data formatted as bitstreams. The transition process from the objects to the stream is called the serialization. Deserialization is the reverse process, which involves replacing the bitstream with interconnected objects located in the working memory of a computer. Hence, in the context of serialization, to save working data in a file we need a generic operation that could automate this transition process regardless of the types we used to create the graph of objects wrapping working data. There must also be a reverse operation creating objects from a file content - deserialization. To guarantee consistency, this operation has to verify the file content against the types used to instantiate objects.

Again, in the transition between the world of objects and the world of bitstreams, we need serialization, which is responsible for the transition of the state of a graph of objects to a bitstream. And deserialization, which is responsible for the reverse process, i.e., for transferring a bitstream into a graph of interconnected objects. We would like these operations to be implemented as generic, i.e., we would not have to program these operations every time but only parameterize it.

Before we move to the next step, it is worth recognizing what we need to implement this functionality. Here, from the world of objects point of view, the list of requirements includes.

Access to the values wrapped by objects that will be the subject of the serialization - in other words, values that will constitute the state of the objects.
The relationships between these objects.

Next, we need to implement an algorithm that will describe in detail this data transformation, which has to be mutually unambiguous. Here, the mutual unambiguity of this process does not mean that each time we perform serialization we will obtain an identical bitstream. The same we should state for deserialization. We will get back to this issue shortly.

So, the first problem we have is how to implement serialization and deserialization to make the transition between the object world and the streaming world possible. The serialization and deserialization process must be mutually unambiguous operation. Moreover, it is not a simple process. Well, someone may say that this is a relative matter because we have no firm metrics of simplicity in this case. However, cloning serialization and deserialization code snippets each time serialization is needed will consume and waste time, so it may be worth implementing this process as a generic library without the need to create dedicated software each time. So the next problem we can define here is the possibility of transition between the streaming world and the object world using the library concept.

If we talk about repeatability by applying a library concept implementing serialization and deserialization functionalities, we need to offer a generic implementation. Namely, we must be able to define this process in advance, without prior knowledge of what will be serialized. Generic implementation of the serialization and deserialization functionality means that we have to implement it in advance and offer it as ready-to-use libraries.

Today on the market, we have many libraries that permit this process to be carried out automatically. So it is justified to ask the following question: why do we need to learn about it? Why go into detail? Well, my point is that if someone wants to use a washing machine, let me refer to this example; they do not need to know how the controller, engine, or temperature sensor works. However, if someone wants to assemble a custom washing machine using available parts, knowledge or understanding of how the engine, controller, and temperature sensor work is essential in this case, even if the mentioned parts are available. Similarly, we need detailed knowledge about how to manage bitstreams in case we are going to use streaming data, for example, the file system.

In summary, to simultaneously use data as objects and bitstreams, our goal must be to combine two worlds. First, in which the data is in object form. The second world contains data in the form of bitstreams. Let me stress now that in both cases, we have the same information but different representations. The data conversion between these worlds is called serialization and deserialization. In the case of serialization, it is a process that involves converting the state of a graph of objects into a bitstream. Deserialization is the reverse process, i.e., converting a bitstream into a graph of objects that must be created in memory. Here the magical statement about the condition of the object appeared; what does object state mean? We will learn the answer to this question soon.

From the above, it could be derived that if an equivalent graph of objects can be reconstructed based on a bitstream, it can be stated that the bitstream is correct for the purpose it has to serve. This reconstruction must be accomplished in compliance with the syntax and semantics rules governing the bitstream. Again, this graph does not have to be identical to the original each time. It is enough for us that it is equivalent from the point of view of the information it represents. It could be added that in some cases, let's say in simpler cases, the bitstream identity can be ensured. This means that for a selected graph of objects, each time, as a result of serialization, we receive an identical bitstream. Then, this bitstream can be compared, for example, to check whether the process is the same as before. It must be stressed that equivalence has no metrics measure that can be applied to evaluate equivalence conditions. Due to the above, it is not possible to formally determine whether the resulting bit stream and the source object graph are equivalent. Therefore, equivalence must be decided by the software developer using custom measures, for example, unit tests. From that, we can derive that only the software developer is responsible for ensuring that serialization and deserialization are mutually unambiguous.

Assuming that the data transformation algorithm has been implemented somehow, there is a need to determine the format of the target bitstream. So we need to determine how to concatenate bits into words, words into correct sequences of words, and how to assign meaning to these sequences of words. Shortly, a set of valid characters, syntax, and semantics rules are required. For example, it could have an impact on the bitstream features, like the possibility of validating and visualizing content using existing tools. Two additional notes regarding the target format of the bitstreams are vital for further consideration.

The list of applications mentioned previously as potential bitstream consumers - includes the exchange of data between remote applications. It should be emphasized here that if these applications are created by different manufacturers, the standardization of this representation becomes extremely important. So, the fact that we combine words into correct sequences of words and assign to them meaning, that these syntax and semantics rules are standard in the sense that there are international documents that are published by organizations recognized as standardizing, that will allow us to recreate the graph of objects in applications that are created by other vendors.

We also said earlier that sometimes, these bitstreams are also used to communicate with humans. Of course, standardization is also important for this kind of application. A bitstream user must be able to read this sequence of bits and, therefore combine sequences of bits into words and words into correct sequences of words. Finally, these strings of words have to have meaning for him. First, it is important to be able to apply encoding to create characters so that the bitstream becomes a text. Let me remind you that the text is a bitstream for which encoding is known in advance or discoverable somehow.

From the previous considerations regarding the transformation of object data into streaming data, we know that the basis of this process is to determine the state of the object. Let me remind you that the state of an object is a set of values that must be subject to a transformation process so that the reverse operation can be performed in the future, i.e., so that the object graph can be recreated and an equivalent object graph can be created.

In order not to enter into purely theoretical considerations, let us return to these topics in the context of sample programs. The examples are described in the document titled Objects Serialization Implementation Examples. The example discussed shows the mechanism of transformation of an object or, more precisely, an object state to a bitstream. In this process, the state of the object is determined by a software developer, which implements an appropriate mechanism responsible for selecting the values that constitute the object state. Since the determination of an object state is the responsibility of program authors, there must be measures allowing them to point out what has to be serialized.

To implement a serialization/deserialization engine, you need to define a data structure, choose a serialization format (like custom, JSON, XML, etc.), and use a serialization library to convert the data wrapped by a graph of objects into the selected format in both directions. The data structure is required to determine the state of objects that are subject to serialization. Apart from the data structure a, guidelines allowing to select only values constituting the state of the object are necessary. To fulfill the mentioned requirements, access to the value holders that constitute the state of the object is also required. Attributes as a language construct at design-time and reflection as a technology at run-time could help to solve some problems related to serialization/deserialization implementation.

To learn more about serialization in .NET, visit the document: Serialization in. NET.

Cybersecurity

Cybersecurity describes the practice of protecting computer systems, networks, and data from cyber threats. In this section, cybersecurity related to bitstreams is considered. Now let's talk about securing streams using cryptography. Talking about cryptography in the context of streams may seem a little strange because usually cryptography is discussed in the context of data security and system security in general. Cryptography is a broad concept, but we will focus only on selected, very practical aspects related to the security of bitstreams.

We already know how to create bitstreams. We can also attach coding to them, i.e., the natural language set of characters. The next step is to assign syntax and semantics that allow the streams to be transformed into a coherent document, enabling a computer user to recover information from these documents. If this is not enough, we can also display these documents in graphical form.

It must be stressed again that in all occurrences, this computation infrastructure is always binary, and we must consider that it may be sent over a network, archived, and processed by another computer. Hence, it is required that the bitstreams are protected against malicious operations. For example, if a document contains a wire transfer order to our bank, the problem becomes real, material, and meaningful in this context.

In the context of the cybersecurity of bitstreams implementation, the following requirements must be encountered:

Integrity: ensure that all users of a source bitstream can verify that the stream has not been modified while it was being archived or transmitted,
Confidentiality: safeguard information from unauthorized access, ensuring confidentiality,
Non-repudiation: confirm authorship so all users of a bitstream can determine who created it and who is responsible for its content. This requirement we call non-repudiation of the author.

Examples illustrating how to implement them are collected in a separate document, Bitstream Cybersecurity.

Conclusion

Part 1 did cover a description of the following concepts presentation, validation, and standardization. Part 2 covers a description of the following concepts serialization, cybersecurity. Examples illustrating how to implement them are collected in a GitHub repository and described in a separate document Data Streams.