External Data - File and Stream Concepts

Mariusz Postol
Jan 03, 2024

875
0
1
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

External Data Preface

The external data is recognized as the data we must pull or push from outside of a boundary of the process hosting the computer program. In general, the external data may be grouped as follows.

streaming: bitstreams managed using the content of files, or network payload
structural: data fetched/pushed from/to external database management systems using queries
graphical: data rendered on a Graphical User Interface (GUI)

This section collects descriptions of examples explaining the usage of the streaming data.

File and Stream Concepts

Operating System Context

Let's look at the .Media folder containing the files used in the examples.

Media Folder

We have different files there, but similar descriptive data, i.e. metadata, are defined for all of them. Among these data, Name, Date, Type, Size, Data created, and much other information that may be useful, but the most important thing is, of course, the content of the file.

After double-clicking on the selected file, an image will appear.

Here we may ask a question - how to describe this behavior? Well, a program was launched. This program must have been written by some programmer. The program opens the file as input data, so the programmer had to know the syntax and semantics rules that were used in this file. The data contained in the file makes it possible to show the content graphically on the computer screen. This is the first example of graphical representation, but we will return to this topic later.

Program Context

Using code snippets located in the FileExample class, differences between file and stream may be explained from a program point of view. From this example, it could be learned that the File is a static class that represents the available file system and provides typical operations against this file system. The content of the file is represented as a bitstream or, rather, the Stream class. It is an abstract class that represents basic operations on a data stream (on the stream of bytes), which allows mapping the behavior of various media that can be used to store or transmit data as the bitstream. From this perspective, it can be proved that file content is always a bitstream (a stream of bytes).

XML-based Presentation

Using bitstreams (file content), we must face a problem with how to make bitstreams human readable. Firs answer we know from the examples above, namely the bitstream must be compliant with a well-known application. Unfortunately, this answer is not always applicable. Therefore we should consider another answer, namely, human-readable representation should be close to natural language. Of course, we have no measure here, and therefore, it is difficult to say whether a bitstream is close enough to natural language to be comprehensible. The first requirement for humans to understand the stream is that it has to be formatted as text. To recognize bitstream as the text directly or indirectly, an encoding must be associated. An example of how to associate an encoding directly with the bitstream is the following XML code snippet:

<?xml version="1.0" encoding="utf-8"?>

The next requirement, common for both humans and computers, is a bitstream association with the comprehensive syntax rules. To make the rules comprehensive for humans, the bitstream should have been formatted as a text. Finally, semantic rules should be associated with the bitstream that allows to assigning of meaning to bitstreams.

The ReadWRiteTest sample code demonstrates how to save working data in a file containing an XML document, which next can be directly presented in other applications, like MS Word editor or Internet Explorer. In this concept, it is assumed that the bitstream formatted as XML is transformed using a stylesheet before being presented. An XML stylesheet is a set of rules or instructions for transforming the structure and presentation of XML documents. It defines how the data in an XML file should be formatted. It is the simplest way to detach a custom document's content from its formatting to be presented as graphical data, provided that the original document is compliant with the XML specification.

After the implementation of the IStylesheetNameProvider interface by the Catalog class, we can convey information about the default stylesheet that may be used to create an output XML file. Thanks to the implementation of the mentioned interface information about the stylesheet (XSLT file) is added to the XML document and can be used by any generic application to open the file and translate the content, for example, catalog.example.xml.

<?xml-stylesheet type="text/xsl" href="catalog.xslt"?>

XML-based Validation

If we are talking about exchanging data between different applications or between an application and a human, the issue of bitstream correctness arises. This issue should be considered on two independent levels. The first one is the correctness of the bitstream as a certain stream of signs, i.e. when the syntax rules are met. The second one is determined by the possibility of assigning information to these sequences and, therefore, assigning meaning to a bitstream.

To better understand these issues, let's look at them in the context of an example catalog.example.xml. The following discussion scops to the XML format but the presented approach should be recognized as a universal one.

The XML (Extensible Markup Language) is a language that defines syntax rules. For example, in the mentioned above XML text, after replacing the closing name of the CD element (by CD1 instead of CD for example) we get an XML syntax error. Syntax error means that the file is not compliant with the XML standard and should not be used anymore. But after replacing the name of the opening markup of the element with the same CD1 name, then this file is correct in the context of the XML syntax. However, it is difficult to imagine that two subsequent elements will have different names but will represent the same information. So at this point, we can say that this file complies with the XML standard with the XML syntax. However, it does not represent the semantics we would expect.

Adding these attributes causes it to refer to the XML schema.

<Catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
         xmlns="http://Viculu34.org/Catalog.xsd"
         >
  <!-- catalog.example.xml content here -->
</Catalog>

The XML Schema allows to define additional syntax rules that will be used to check XML text against these rules. The syntax rules for the XML file must be met in a valid XML document. Hence, we can say that without the XML schema, it is just XML text. After adding schema, we can define how to construct the document that is to be verified using this additional schema document. After attaching the rules described in the schema, we can therefore verify the document and assume that if the document does not comply with the schema, it means that it is not valid and should be rejected instead of being used for further processing. Thanks to this, we can ensure that documents transferred between individual applications will be verified from the point of view of their syntax rules, which should be derived from the document semantics.