How to retrieve EMC centera cluster/pool capabilities


Introduction

This article is part of a series of articles that I am writing to illustrate the use of EMC Centera SDK and the .NET wrapper being developed as open source project to store "fixed content" on the EMC Centera storage appliance. But before I start, I like to explain what is "fixed content" and give an overview of the reasoning behind the emerging of this type of storage.



Fixed Content Definition

Fixed content is information that never changes after creation. It's actively referenced, typically shared among users and must be retained (maintaining a copy of fixed content for a mandatory period of time) for a long period of time. Examples include: electronic documents, presentations and e-books; rich media such as movies, videos, digital photographs and audio files; check images and financial statements; bioinformatics, X-rays, MRIs and CAT scans; CAD/CAM diagrams and blueprints and e-mail messages.

Example of "Fixed Content"

  1. An average enterprise (a 250-person organization) generates approximately 1.5TB of e-mails per year. 
  2.  A picture archive in a large hospital may generate more than 5TB per year in digital X-rays or MRIs. 
  3.  Banks are scanning millions of check images per year, requiring multiple terabytes of storage.

State of the Industry

Large portion of all digital information is fixed content. It is expected that fixed content to be the largest portion of digital content created by the human race in the next century.  Exceeding all dynamic content put together.



Also, Information life cycle drives to more fixed content. Enterprises embracing things like E-mail and Electronic documents are increasing the need for fixed content storage exponentially. Finally, emerging regulations requiring retention (maintaining a copy of fixed content for a mandatory period of time) in the Financial and Healthcare industries are creating a huge need for fixed content storage and fixed content solutions.

EMC Centera appliance is one of the appliances that are available in the market today to satisfy that need. Other companies like NETApp has solutions equivalent to the Centera. But this series of articles are specific to showing how to code using the Centera SDK.

What you will need to be able to develop against the appliance

Note that the only way to save content on most "fixed content" storage device is through the use of the device propriety API(s) that the manufacture of the device publishes. Some manufactures do offer an open standard (CIFS, NFS, HTTP and WebDAV interfaces) to read/write to their own devices. But, usually, you end-up losing a lot of the device power. Things like WORM (Write-once-read-many) functionality, or retention capabilities are usually lost with the open standards.

  • You also will need the .Net wrapper for the Centera SDK. The latest version of the opensource.net project is on sourceForge. The link is: http://sourceforge.net/projects/cosi-dot-net
  • You need to have access to the "Public Centera" appliances. EMC recognized that the Cenetra device is not available everywhere and did set-up an appliance on the internet that developers can develop against. The content of this appliance is purged periodically by EMC. The latest IP(s) can be found on EMC site. As of this writing the valid IP(s) are:

    EMEA1 - 152.62.65.11, 152.62.65.12, 152.62.65.13, 152.62.65.14
    EMEA2 - 152.62.65.16, 152.62.65.17, 152.62.65.18, 152.62.65.19
    EMEA3 - 212.3.248.41, 212.3.248.42, 212.3.248.43, 212.3.248.44
    EMEA4 - 212.3.248.46, 212.3.248.47
    EMEA5 - 152.62.65.21, 152.62.65.22
    US1       - 128.221.200.56, 128.221.200.57, 128.221.200.58, 128.221.200.59
    US2       - 128.221.200.60, 128.221.200.61, 128.221.200.62, 128.221.200.63
    US3       - 128.221.200.64, 128.221.200.65, 128.221.200.66, 128.221.200.67
    US4       - 128.221.200.116, 128.221.200.117, 128.221.200.118, 128.221.200.119
    US5       - 128.221.200.120, 128.221.200.121, 128.221.200.122, 128.221.200.123

Special Architecture knowledge You Need

  • Centera Appliance stores Content. This content is stored using an Address.  This content/address combination is called CAS (or content addressable storage). So you will hear/read about this term in the industry these days.
  • The smallest block of data that can be stored must be housed inside a memory block the SDK calls "C-Clip". In another word, you have to create a C-Clip and place your content inside the C-Clip first. Then you send the C-Clip to the Centera to be saved. The C-Clip itself is made of 2 other components, the Content Descriptor File or CDF for short and the BLOB.
  • The Content Descriptor File or CDF for short is an XML file that holds metadata. The CDF contains TAGS and ATTRIBUTES.

    Tag

    An XML Tag in the CDF.

    A user defined name.
    Example: <Application_Name>ImageStore2004</Application_Name>

    Attribute

    An XML attribute in the CDF.
    A user defined value.
    Example: <My_App name= "ImageStoreServer"/>
  • The C-Clip also holds a BLOB.  The BLOB is usually the content you wanted to stote.

    BLOBs have the following characteristics

    They hold object stored on Centera .
    And they represented as distinct bit sequence of the object you are trying to store.



  • Centera runs an OS called "CenteraStar". This OS is optimized for writing and reading the C-Clip objects.
  • Centera object have Metadata. The applications you develop create metadata associated with one or more objects. Then these objects are stored independent of volume/directory information as in the image below:



  • Over All Process Overview.

Centera Three Modes

Basic mode

Centera acts like a standard magnetic storage. An object marked for deletion is deleted immediately.

Compliance mode

Active retention protection ensures availability of objects for a configurable period of time. An object marked for deletion is not deleted until the retention period passes.

Compliance plus mode

Similar to compliance mode, compliance plus mode uses retention periods. The default retention period is infinite. Unlike compliance mode, data never purges.

Benefits of the Compliance modes

Retention enforcement

  • Retention is set on the clip. Applies to all blobs that are referenced by the clip.
  • Cannot delete a clip/blob when retention has not expired.
  • Once retention expires. Clip is eligible for deletion.

Data deletion enhancements: Shredding

  • Overwrites data multiple times with a random bit pattern.

Centera SDK

Centera-supplied Software Development Kit (SDK) contains
C callable libraries
Java Interface utilizes JNI
Documentation

Sample Code

And can be downloaded from the following link
http://lighthouse.developer.emc.com/developer/devcenters/CAS-Centera/index.php 

You will need to create an account with EMC to be able to download the full SDK.

Why Is It needed?

Provides content addressing framework
No file system and associated drawbacks
Applications access the Centera via API calls only

Centera Cluster

A cluster is a logical CAS archive that is appears to your application as a single unit. 
A cluster can be accessed by one or more applications via a set of node IP addresses and access profiles.


 
Pool

A pool is an SDK object that represents one or more clusters. Your application must open a pool by providing a series of node IP addresses and access profile credentials for the desired set of clusters. The first accessible IP address in the list represents the primary cluster, while subsequent IP addresses are considered the secondary clusters (assuming that they represent distinct clusters). The pool object also auto-discovers any replica clusters that are configured via the primary or secondary clusters.

Profiles

The system administrator creates access profiles to applications. Profiles are a means to enforce authentications and authorization. The system administrator can determine which applications have access to a cluster and what operations they can perform.

An application can only log into a Centera if a profile for that application has been created on the Centera cluster and the credentials for that profile have been made available to the application server.

Once the profiles have been created on the Centera cluster, the system administrator exports the profile information to a Pool Entry Authorization (PEA) file and copies this file to the application server. The system administrator can set an environment variable that points to the PEA file or can leave it to the application to give the path to this file. So When you code your application you can either ignore the PEA file and the cluster will point the SDK to the location of the PEA file to use or as a developer, your enterprise may have created specific PEA files and distributed them to the development team. At this point you can give the full path of the PEA file in your code when opening the pool.

It is important to note that for these articles, the public available profiles will be used.

The files has the following naming convention

ClusterName_ProfileName_CapabilitiesList.pea

For example, "us2_armTest2_rdqeDcwh.pea", translates to:

Application Profile belongs to Centera Cluster US2
Profile Test2, Advanced Retention Management (arm) enabled
Capabilities: All enabled please refer to the list below.

Capabilities Definitions:

  1. r: read
  2. w: write
  3. d: delete
  4. q: query
  5. e: exists
  6. D: privileged delete
  7. c: clip copy
  8. h: retention hold
  9. Monitor all profiles except profile1 are configured to enable the monitor capability.

Each profile also comes enabled with name/secret combination that corresponds to the profile name. Thus to access a profile defined by us2_armTest2_rdqeDcwh.pea file, the application could alternatively use name=armTest2,secret=armTest2 in the connect string.

Conclusion

And as "Forest Gump" said in the drama movie with the same name "that's all I am going to say about that".

This introduction should give you enough knowledge to be able to read the SDK and be able to write code to use the Centera appliance.

Since this article is one in a series of articles I am writing about different functionalities, each individual article will have this introduction and then will discuss the specific Cenera functionality the article will address.

How to setup the development environment

  • In visual studio, create a new project called "AdrdProjectCentera1" as in the figure below:


 
Note: I am creating the project on my "E" drive in the "CAS" directory. The project name in this article is "AdrdProjectCentera1".

This will create the directory structure needed by visual studio. The directory of interest in this solution structure is the debug directory that gets created by visual studio. In this article the full path of the directory of interest is as follows:

E:\CAS\AdrdProjectCentera1\AdrdProjectCentera1\AdrdProjectCentera1\bin\Debug

Note that your path will be different depending on the location of your project.

  • The next step is to unzip the EMC Centera SDK files. The SDK is delivered from the EMC site as a single zipped file. The default zip file name is "31].1_SDK_Windows_gcc.zip" (as of Oct 13, 2007).  Once the file is unzipped, a number of directories will be created. Copy the files in the "lib" directory to the "debug" directory created by visual studio in step 1. The files that you will copy are "FPLibrary.dll"," fpos32.dll","fpparser.dll","pai_module.dll". There is also an "FPLibrary.jar" file that exists in that "lib" directory. You do not need to copy that file. The "FPLibrary.jar" file is the Java wrapper for the "FPLibrary.dll". This ".jar" is the equivalent of the .NET wrapper that the "sourceForge" project is all about. Also, all the ".lib" files are to be used if you are developing using "C" or "C++". Just ignore these files for this article.
  • Next download all the PEA files to be able to develop against the "public Centera". I will used the "US XXXX" PEA files from the EMC website. The link is:

    http://lighthouse.developer.emc.com/developer/devcenters/CAS-Centera/modules.php?name=ClusterShower
    (As of Oct 13th, 2007). Make sure you copy the ".pea" files to the debug directory described in step 1 above.

  • The next step would be to unzip the .NET wrapper you downloaded from "sourceForge" site. The default zipped file that you downloaded would be "FPApi.NET.zip". Once it is fully unzipped, the following directories would be created:





  • The zip file from "sourceForge" does not include the binary file of the wrapper (compiled version of the code). So you will need to compile the code to generate the final wrapper that you will use in this article project. To do so, double click on the "Wapper.sln" the zip file extraction created. This should start a new instance of visual studio and the solution should look as follows:
  • Compile the solution by selecting Build->Build Solution menu options as in the next figure:



  • Once the build is complete, copy the files "FPSDK.dll" and "FPSDK.pdb" that are generated as a result of the solution build to the debug directory created in step 1.
  • The final debug directory for the solution should look like this:



  • Final step is to set a reference to the "FPSDK.DLL" in your solution. To do so, open the original solution you created in step 1 (if it is not already open).

Finally, The article content "How To Retrieve the Centera Cluster Capabilities"

The following screen shot is this article UI.


 
To actually retrieve the cluster information, you need to make the following API calls

  1. Open the Centera Cluster by creating an instance of the wrapper "FPPool" object.
  2. Use the FPPool instance you created in the step above  to retrieve the clustor Capabilities.
  3. Close the FPPool

Open the Pool

To open the Centera Pool you will need the Clustor "Connection String". This is usually an IP address, if a single Centera, or a number of IP(s) if Centera is configured as a cluster separated by commas. Also, concatenated to the IP list a "?" sign and the the full path of the ".PEA" file. In the code associated with this article, the ".PEA" files are included in the "debug" directory.

Sample of the connection string

"128.221.200.56?us1_profile1_rwqe.pea"
Or
"128.221.200.56, 128.221.200.57, 128.221.200.58, 128.221.200.59?us1_profile1_rwqe.pea"
Retrieve the Clustor Capabilities
#region Build the String to display in the UI
strPoolInfo = ("\nPool Information" + "\n================" +
"\nCluster ID:                   " + myPool.ClusterID +
"\nCluster Time:                 " + myPool.ClusterTime +
"\nCluster Name:                 " + myPool.ClusterName +
"\nCentraStar software version:  " + myPool.CentraStarVersion +
"\nSDK version:                  " + FPPool.SDKVersion +
"\nCluster Capacity (Bytes):     " + myPool.Capacity +
"\nCluster Free Space (Bytes):   " + myPool.FreeSpace +
"\nCluster BlobNamingSchemes :   " + myPool.BlobNamingSchemes +
"\nCluster Capacity:             " + myPool.Capacity.ToString() +
"\nCluster CenteraEdition:       " + myPool.CenteraEdition +
"\nCluster ClipBufferSize:       " + myPool.ClipBufferSize.ToString() +
"\nCluster DeleteAllowed:        " + myPool.DeleteAllowed.ToString() +
"\nCluster DeletionsLogged:      " + myPool.DeletionsLogged.ToString() +
"\nCluster ExistsAllowed:        " + myPool.ExistsAllowed.ToString() +
"\nCluster QueryAllowed:         " + myPool.QueryAllowed.ToString() +
"\nCluster RetentionDefault:     " + myPool.RetentionDefault.ToString()+
"\nCluster ReadAllowed:          " + myPool.ReadAllowed.ToString()+
"\nCluster WriteAllowed:         " + myPool.WriteAllowed.ToString());
#endregion

Close the FPPool

In the sample included, I have opened the pool inside a using statement, therefore, when done the FPPool will be closed.

It is possible to use the following statement

myPool.Close();

Explaining the Capabilities

ClusterID: Unique ID of the cluster
ClusterTime: Time on the cluster. Note that all Centera maintain GMT time
ClusterName: The name given to the clustor. Most of the time this value is never used or filled by the Centera adminitrators
CentraStarVersion: the version of the OS runing on the Centera
SDKVersion: The version of the SDK your application is using. Usually it is the version you downloaded from EMC. Note that newer versoins of the SDK can talk to earlier versions of CenteraStar OS.
Capacity: Total space on the Centera pool you are connecting to.
FreeSpace: Total available space on the Centera pool you are connecting to.
CenteraEdition: Is either "basic", "CE" or "CE+". Please see Centera Modes earlier in this article.
DeleteAllowed: Is deletion of clips allowed on this pool.
DeletionsLogged: Is deletion logged. Usually this is set to true for Auditing purposes. Specially if the pool/cliuster is in "basic" mode.
RetentionDefault: The default retention period. Most of he public Centera clusters have this value set to 00:00:00. This implies that there is no retention. In another word, C-Clips can be deleted immediately.

For all other capabilities, please see the Centera API reference guid "Centera_SDK_3.1_API_Ref_Guide.pdf" and review the FPPool_GetCapability API.

Also included in the demo code 2 classes that are used to serialize the capabilities. The classes are named "AdrdCenteraClusterInfoItem" and "AdrdCenteraRetentionInfoItem" respectivaly. These classes represent most of the capabilities that you will ever use when developing against the Centera. I will use them in my next 2 Article on how to write to the centera and how to read from the Centera.

You can Get the Microsoft or a PDF Version of this article from the following
http://WWW.ADRDWeb.com/Centera/ListOfArticles.htm


Similar Articles