External Streaming Data - Bitstream Format

Introduction

Computer science in general, and especially software engineering, is a field of knowledge that deals with the automation of information processing. Programs can be recognized as a driving force of that automated behavior. To achieve information processing goals programs have to implement algorithms required by applications. In other words, the programs describe how to process data, which represents information relevant to the applications in concern. Apart from the implementation of the algorithms, therefore, data management is a key issue from the point of view of automation of information processing in particular and computer science in general.

The main aim of this article is to extend knowledge related to object-oriented programming focusing on interoperability between the computing process and data visualization, archiving, and networking resources. Particular emphasis is placed on the identification of solutions that can serve as a certain design pattern with the widest possible use in the long run.

The external data is recognized as the data we must pull or push from outside of a boundary of the process hosting the computer program. In general, the external data may be grouped as follows

  • Streaming: Bitstreams managed using content of files, or network payload
  • Structural: Data fetched/pushed from/to external database management systems using queries
  • Graphical: Data rendered on a Graphical User Interface (GUI)

This article covers selected topics related to streaming data. For this kind of data, bitstreams are the most popular because they are used as file content and for network communication (see also External Data Management (ExDM)).

Using bitstreams we must face a problem with how to make bitstreams human readable. The first answer is that it must be compliant and coupled with a well-known application. The application opens this bitstream as input data and exposes it to the user by employing appropriate means to make the data comprehensible, for example, gif, docx, and jpg files to name only a few. Unfortunately, this approach does not apply to custom data. Therefore we should consider another approach, namely human-readable representation should be close to natural language.

Bitstream format


Domain-specific language (DSL)

Above it was stated that human-readable representation should be close to natural language. The first requirement for humans to understand the stream is that it has to be formatted as text. To recognize bitstream as the text an encoding must be associated by default, directly or indirectly. The next requirement, common for both humans and computers, is that a bitstream must be associated with comprehensible syntax rules. Finally, semantics rules should be associated with the bitstream that allows to assigning of meaning to bitstreams. Shortly there have to be defined a text-based language. A domain-specific language (DSL) is a text-based language dedicated to expressing concepts and data within a specific area.

Using DSL to describe the bitstreams a Data Transfer Object (DTO) concept can be used as a foundation to encapsulate and transport data between applications. It may be a text document that contains fields to store data.

To use DTO in a multi-vendor environment to transfer data between instances of different programs the standardization of the syntax and semantics rules is vital. Additionally, the possibility to use well-defined and widely accepted schema documents is a key feature to establish interoperability.

In conclusion, except for programming languages like Java, C#, and Python, examples of well-known and widely accepted domain-specific languages are XML, JSON, and YAML formats to name only the most crucial today. DSL may be defined on demand to fulfill the requirements of a specific domain. To promote interoperability in a multi-vendor environment, the XML, JSON, and YAML formats seem to be commonly accepted and widely used international standards.

Extensible markup language (XML) format

Extensible Markup Language (XML) is a standard text-based format for representing structured data in machine-readable form. Because it is founded on the text it could also be recognized as human-readable form. Its simplicity and flexibility make it suitable for representing a wide range of data categories.

It consists of markup tags that define elements within a document. Each element can have attributes and contain nested elements, forming a hierarchical structure. The basic syntax involves opening and closing tags to encapsulate data. Attributes provide additional data in the context of the opening tag.

XML is often used for data interchange between different applications.

Overall, XML is versatile and widely adopted in various domains for configuring settings and exchanging process data.

XML visualization

As the XML format is text-based it can be directly read and displayed by a variety of software tools. However, it is not the preferred format, because it does not contain any formatting information. Today we expect data presentation to meet user experience, i.e. to have an appropriate layout and style. We can meet this requirement using any application that supports XSLT transformation of XML documents into other text documents, including but not limited to equivalent HTML documents. XSLT uses a template-driven approach to transformations: you write a template that shows what happens to any given input element. For example, if you were formatting working data to produce HTML for the Web, you might have a template (stylesheet file) to match an underlined group of elements and make it come out as a table.

Let's go back to the question of how to visualize data for a user, for a human. It was stated that an XML file is text, namely a bitstream for which the encoding is defined. It allows to employ of any text editor. Unfortunately, if a file is formatted this way and is seen by persons, who are not familiar with XML technology, it won't be easy to associate any meaning with the text. In this context reading the document and understanding the document are not the same.

To make it easier to visualize the data that is in the XML file, let's use a feature of XML files that allows a transformation of XML text to any other text. Finally, a few notes related to XML stylesheet transformation. Not only web browsers have a built-in mechanism ensuring transformation. This transformation can be defined in such a way that the target text that will be created has the features of a natural language. The final form may also cover ergonomic requirements, and in particular, it may be the user interface. Shortly, thanks to the transformation of XML files using stylesheet it is possible to add formatting to the data contained in the XML bitstream.

XML validation

To address the validation requirement, XML (Extensible Markup Language) as a text-based format for representing structured information and XML Schema as a language for expressing constraints about XML documents are very good candidates to be used by the file operation. Today applications use objects to process working data according to the Object Oriented Programming (OOP) paradigm.

You may use the XML Schema Definition Tool (Xsd.exe), which generates XML schema or selected language classes from XDR, XML, and XSD documents, or from classes in a run-time assembly.

To better understand topics related to validation check out code examples described in the section XML-based Validation.

XML standardization

Extensible Markup Language (XML), is a standardized markup language designed to store and transport data. It provides a set of rules for encoding documents in a machine-readable format. The main goals of XML standardization are to ensure consistency in data representation and interchange across non-coherent systems.

Visit the See also section to get more details.

JavaScript object notation (JSON)

JavaScript Object Notation (JSON), is a lightweight data interchange format. It is a text-based domain-specific language that is easy for humans to read and write, and for machines to parse and generate. JSON is often used to transmit data between a server and a web application, as well as for configuration files. It consists of key-value pairs and supports data types like strings, numbers, objects, arrays, booleans, and null.

JSON visualization

JSON can be transformed into other text formats using a variety of programming languages employing additional libraries for parsing and then converting to different formats like CSV, XML, or others as needed. Some languages, like JavaScript, have built-in functions for JSON manipulation, and you can use libraries or frameworks to convert JSON to various formats as needed.

JSON validation

To address the validation requirement JSON (JavaScript Object Notation) as a text-based format for representing structured information and JSON Schema as a language for expressing constraints about JSON documents are very good candidates to be used by the operation on bitstreams. Thanks to schema definition it is also possible to derive new domain-specific languages based on JSON.

You may use a lot of available open-access domain tools, which generate XML schema or selected language classes from different kinds of documents.

To better understand topics related to validation check out code examples related to XML described in the section XML-based Validation in an article published on C#. By design, using XML has no impact on the discussion covered by this article. XML is used to express a general discussion using concrete language.

JSON standardization

This language is recognized as an international standard. It is standardized by the International Organization for Standardization (ISO) as ISO/IEC 21778:2017. The standardization ensures that JSON is consistent and widely accepted for data interchange between different systems and programming languages. There is also a Request for Comments:7159 specification titled The JavaScript Object Notation (JSON) Data Interchange Format.

ISO/IEC 21778:2017 specifies the JSON data interchange format, its data model, and its various data types. JSON's simplicity, ease of use, and language-agnostic nature have contributed to its widespread adoption in various domains for representing and exchanging data. JSON is also supported by an open community maintaining schema specification JSON Schema.

Yet another markup language (YAML)

"Ain't Markup Language" (YAML) is a human-readable data serialization format. It is often used for configuration files and data exchange between development environments with different data structures. YAML uses indentation to represent hierarchy and relies on a straightforward syntax with key-value pairs. It aims to be easy to read and write, making it popular in various applications, including configuration files for software projects.

YAML visualization

The YAML community doesn't define any special language allowing the automatic transformation of YAML documents to other text-based documents that can be used to visualize associated information. To visualize the content of a YAML document, various tools and editors that support YAML have to be used. Here are a few options

  • Online YAML Editors: use online YAML editors like YAML Online Viewer or YAML Lint, where you can paste your YAML code and visualize the structure.
  • Integrated Development Environments (IDEs): Many modern IDEs, such as Visual Studio Code, Atom, and PyCharm, have built-in support for YAML. Open your YAML file in one of these IDEs to benefit from syntax highlighting and a structured view of your YAML document.
  • YAML Viewer Browser Extensions: there are browser extensions available that can format and visualize YAML files directly in your browser. Check for extensions compatible with your preferred browser.
  • Command-Line Tools: you can use command-line tools like yq or jq to format and view YAML content.
  • Online YAML Visualizers: some websites offer online YAML visualizers that allow you to paste your YAML code and see a visual representation of the data structure. Search for "Online YAML Visualizer" to find such tools.

Choose the method that best suits your preferences and workflow.

YAML validation

While YAML itself is not designed to be extended or derived into new languages, it is possible to create domain-specific languages (DSLs) or configuration languages based on YAML syntax. Developers can define specific rules and conventions within the YAML structure to suit the requirements of their particular domain or application.

In essence, you can create a new language by establishing a set of guidelines for interpreting the YAML data in a specific way. This is often done in the context of configuration files or data representation for a particular software or system. Keep in mind that this is more about using YAML as a foundation and defining the semantics and rules for your specific language rather than formally deriving a new language from YAML.

Conclusion

This article is related to examples of practical scenarios regarding various aspects of external data management bitstreams format. Check out the publications from the See also section. Follow me on LinkedIn, GitHub.com, and ORCID.org to be in touch.


Similar Articles