Introduction To Data Governance


  • It is a data management function to ensure the quality, integrity, security and usability of the data collected by an organization
  • It needs to be in place from the time a data is collected until the data is destroyed
  • It focuses on making the data available to all stakeholders in a form that they can readily access and use in a manner that conforms to regulatory standards
  • Finally, It ensures that the data is secure
    • It is accessed only by permitted users in permitted ways
    • It is auditable, meaning all accesses, including changes, are logged, and compliant with regulations

      Introduction to Data Governance

The purpose of data governance is to enhance trust in the data

  • Ensuring trust in data requires data governance strategy to address 3 key aspects: discoverability, security, and accountability
  • Requires data governance to make technical metadata, lineage information and a business glossary readily available.
  • Business critical data needs to be correct and complete.
  • Finally, master data management is necessary to ensure that data is classified to ensure appropriate protection against inadvertent or malicious changes or leakage.
  • In terms of security, regulatory compliance, management of sensitive data and data security and exfiltration prevention may all be important depending on the business domain and the dataset in question.

Classification and Access Control

While the purpose of data governance is to increase the trustworthiness of enterprise data so as to derive business benefits, it remains the case that the primary activity associated with data governance involves classification and access control.

Introduction to Data Governance

Phases of Data Lifecycle

Proper oversight of data throughout its lifecycle is essential to optimizing its usefulness and minimizing the potential for errors. Defining this process end-to-end across the data lifecycle is needed to operationalize data governance and make it a reality.

Introduction to Data Governance

Data Governance Framework

Introduction to Data Governance

The People: Roles, Responsibilities and "Hats"

Introduction to Data Governance

Data Protection in Cloud


Use Cloud Identify and Access Management (IAM) systems rather than the Kerberos-based or directory-based authentication

This best practice involves managing access services by defining roles, specifying access rights, and managing and allocating access keys for ensuring that only authorized and authenticated individuals and systems are able to access data

Security surface

One of the benefits of the public cloud is the availability of dedicated, world-class, security teams.

Virtual machine security

In securing data in the public cloud, it is necessary to design an architecture that limits the effects to the rest of the system in the event of a security compromise.

Microsoft Azure offers Confidential Compute to allow applications running on Azure to keep data encrypted even when it’s in-memory.

Physical security

Make sure that data center physical security involves a layered security model with as many safeguards as possible among electronic access cards, alarms, vehicle access barriers, perimeter fencing, metal detectors, and biometrics, and laser beam intrusion detection.

Network security

The simplest form of network security is a perimeter network security model — all applications and personnel within the network are trusted and all others from outside the network are not.

Security in transit

Network security is made difficult because application data often must make several journeys between devices known as “hops” across the public Internet.

Data Exfiltration

Scenario where an authorized person or application extracts the data that are allowed to access and shares it with unauthorized third parties or moves it to insecure systems.

Secure code

Data lineage is of no effect if the application code that produces the data or transforms it is not trusted.

Zero trust model

All access to enterprise resources is authenticated, authorized, and encrypted based on device state and user credentials.

The zero trust model consists of a few specific parts:

  • Only a device that is procured and actively managed by the enterprise is allowed to access corporate applications.
  • All managed devices need to be uniquely identified using a device certificate that references the record in a Device Inventory Database, which needs to be maintained.
  • Tracks and manages all users in a User Database and a Group Database which tightly integrates with HR processes that manage job categorization, usernames, and group memberships for all users.
  • A centralized user authentication portal that validates two factor credentials for users requesting access to enterprise resources.
  • Define and deploy an unprivileged network that very closely resembles an external network, although within a private address space. The unprivileged network only connects to the Internet, limited infrastructure and configuration management systems. All managed devices are assigned to this network while physically located in the office, and there needs to be a strictly managed Access Control List between this network and other parts of the network.
  • Expose enterprise applications via an Internet-facing access proxy that enforces encryption between the client and the application.
  • Interrogate multiple data sources to determine the level of access given to a single user and/or a single device at any point in time.

Identify and Access Management

Access control encompasses authentication, authorization, and auditing.


Policies are rules that enable your developers to move fast, but within the boundaries of security and compliance. There are policies that apply to users: authentication and security policies, such as second factor authentication, or authorization policies that determine who can do what on

Data Loss Prevention

AI methods, such as Cloud Data Loss Prevention can be used to scan tables and files in order to protect your sensitive data. These tools come with built-in information type detectors to identify patterns, formats, and checksums.


Encryption helps to ensure that if the data accidentally falls into an attacker’s hands, they cannot access the data without also having access to the encryption keys

Access transparency

It is important for safeguarding access to the data that any access to the data is transparent.

Keeping data protection agile

Data protection can not be rigid and unchanging. Instead, it has to be agile to take into account changes in business processes and in response to observed new threats.

Data lineage

A key attribute of keeping data protection agile is to understand the lineage of every piece of data. Where did it come from? When was it ingested? What transformations have been carried out? Who carried out these transformations? Were there any errors that resulted in records being skipped?

Event threat detection

The overall security health needs to be continually monitored as well. Network security logs need to be analyzed to find the most frequent causes of security incidents. Are a number of users trying (and failing) to access a specific file or table? It is possible that the metadata about the file or table has been breached. It is worth searching for the source of the metadata leak and plugging it. It is also advisable to secure the table before one of the attacks succeeds.