PiP - External Streaming Data - Useful Concepts - Part 1

Mariusz Postol
1y
1.3k
0
1

Article

TP5.13Delta.zip

Introduction

Information computation means a process engaging a computer (a physical device) to process information as a series of actions or steps taken to achieve a particular result or help to fulfill a task. The main challenge is that information is abstract. Precisely speaking, it is a kind of knowledge that cannot be processed directly by any physical device. To address this issue, the information must be represented as data using a variety of bit-based codes.

The external data is recognized as the data we must pull or push from outside of a boundary of the process hosting the computer program. In general, the external data may be grouped as follows

Streaming: Bitstreams managed using content of files, or network payload
Structural: Data fetched/pushed from/to external database management systems using queries
Graphical: Data rendered on a Graphical User Interface (GUI)

To use computers to automate information processing we have to deal with bitstreams as the information representation. By design, bitstream management involves the organization, storage, retrieval, communication, and manipulation to ensure accuracy, security, and accessibility. It encompasses data collection, storage architecture, integration, and maintenance to support efficient analysis and decision-making.

To fulfill these requirements, the following concepts could be a real relief

Presentation: Involves visualizing, which involves creating graphical representations of data to gain understanding and insight
Validation: Refers to the process of ensuring the accuracy and quality of source data before using
standard data format refers to a consistent and agreed-upon structure for representing data
Serialization: is the process of converting a data graph of objects into a bitstream
Safeguarding: refers to the practice of protecting digital information throughout its entire lifecycle to prevent unauthorized access, corruption, or theft.

To make the long story manageable the article will be divided. Part 1 covers a description of the following concepts presentation, validation, standardization. In Part 2 we will focus on the serialization, and safeguarding of data.

Presentation

Bitstream presentation is implemented by various ways of conveying information, including textual and tabular formats. Hence, first of all, we need to deal with data presentation, to enable the use of bitstreams also by a human computer user. In this context, we must take into account the following terms natural language, ergonomics, and graphical user interface.

A typical example that we can cite here is using the Internet. Using a web browser, a server-side application uses objects and then serializes the data that the user needs, sends it over the network, and then the browser displays it on the screen. And here it is important, that the browser always displays data in graphical form. This applies to all kinds of data used by humans. Data, even when reading a newspaper, is always formatted as a graphical presentation. Let me remind you that any character is also a small picture. This is one feature of data that is prepared for this. so that man can use them. The second feature is that this data must be written in a natural language that humans know. The concept of natural language is very broad. For example, XML text is said to be human-readable. But is this a piece of natural language?

From the above, we can derive that the bitstream should be formatted to resemble a natural language. Of course, we have no measure here and therefore it is difficult to say whether something is close enough to natural language to be comprehensible.

Validation

Applications save working data into bitstreams (for example content of files) to keep state information, provide processing outcomes, or both. Applications need robust storage, i.e. correctness of the stored data has to be validated every time an application reads it back from a bitstream. It must be carefully observed if the bitstreams are also modified by other applications or directly by users, because data corruption may occur.

If we are talking about exchanging data between different applications or between an application and a human, the issue of data correctness arises. This issue should be considered on two independent levels. The first one is the correctness of a stream of signs, i.e. validation if the selected syntax rules are met. The second one is the possibility of assigning information (meaning) to these correct sequences and therefore assigning meaning to a bitstream. For humans to understand the stream, it will be accomplished by defining semantics rules, i.e. rules that will allow us to associate meaning with bitstream. The issue of ergonomics is also important in how easy it is to absorb information represented by the bitstream. Of course, the closer we are to natural language, the easier it will be, but again, in this matter, we do not have measures that will allow us to determine how good our solution is.

To better understand the above-mentioned topics, let's look at them in the context of code examples explained in the section XML-based Validation. In this section, XML examples are only subject to more detailed examination but by design, it has no impact on the generality of the discussion.

Standardization

When we talk about the syntax and semantics of a stream, the first thing to consider is the scope of data use. Data produced by one program instance can also be used by the same program instance. In such a case, if the process runs autonomously and is symmetric from a serialization and deserialization point of view, we should not expect any further problems.

If we are talking about interoperability between different applications, we must consider a situation in which these applications have been written using different programming languages. In this case, the problem arises of how to create types in other languages that will represent the same information. In the context of a text document, a kind of schema may be used.

The schema in this context refers to a bitstream structure or blueprint that defines the organization and format of the document. It outlines the arrangement of elements, their relationships, and any rules or constraints that govern the content of documents. Simplifying, schema allows the definition of additional syntax rules in a domain-specific language. Schemas help ensure consistency in the representation of information within the text document. It means schema definition could also be a foundation of semantics rules that assign meaning to the document text. As a result, we could recognize the schema as a good idea to validate text documents and check whether incoming text is a document we expect. Instead of using a schema to validate text-based bitstreams, we may use an equivalent set of classes.

Because the data may be used by different instances of a program, we also have to take into consideration that the applications may be in different versions or written using different languages. What is worse, the data representation also must be subject to versioning. In such a case, there is a problem of data compatibility between independent instances of the program. So the question arises whether the data serialized by one version of the program is used by another version of the program run as a different instance.

Another very popular applicability domain of streams may be the use of to implement interoperability between instances of various programs that are created using different technologREADMESerialization and implemented on different platforms. Then there is also the issue of technological compatibility. Also in this case, it must be taken into consideration that classes (types) that were created in one technology cannot necessarily be directly used in another technology. In this case, we must take into account that in another technology the same information will be represented differently.

If schema definition is expressed in a widely accepted format it should be possible to generate types in selected programming language based on this schema. Of course, it is a chicken and egg problem namely, should we first create types in the selected programming language, or should we create these types in the schema and then create classes based on the schema definition? But let's try to see how this can be achieved using an example.

Conclusion

Part 1 covers a description of the following concepts presentation, validation, standardization. Examples illustrating how to implement them are collected in a GitHub repository and described by a separate document Data Streams.