Collective repository: Data mining the controller

A quick introduction to the collective repository.

Perhaps the primary responsibility of the collective controller is that of storing data about the collective.

All of the collective members (including the controller) publish information about themselves to a component of the controller called the collective repository. This collective repository is built on technology that enables multiple controllers of the same collective to quickly and easily share (or ‘replicate’) data between each other.

The data published to the repository include: the servers which belong to the collective, general information about each server such as their file system paths, running state, applications and MBeans, as well as data used to access the member remotely, such as the member’s remote JMX port.

All of the data in the collective repository is represented in a tree of nodes.

What is a node?

A node is an element of the repository’s data tree. Each node has a unique path, very similar to a file system path, and a node may or not may contain any associated data (it may just be an empty node if it exists to satisfy the path to a child node).

For example, the node which represents whether a server is running or not has the following path:

/sys.was.collectives/local/hosts/hostName/userdirs/userDir/servers/serverName/sys.status

The value stored at this sys.status node is either STARTED or STOPPED.
The value stored at the ‘servers’ node is null (i.e. an empty node).

The values of hostName, userDir and serverName within the path uniquely identify the member. This naming pattern allows for multiple servers in a Liberty profile, and multiple Liberty profiles on a host. A server on myhost.com, which has a wlp.user.dir of /wlp/usr and a name of myServer would have the following repository path:

/sys.was.collectives/local/hosts/myhost.com/userdirs/%2Fwlp%2Fusr/servers/myServer/sys.status

Note that the userDir component of the path is URL encoded. This is to prevent the slashes in the file system path for the wlp.user.dir being interpreted as repository path delimiters.

Can I access this data?

Yes! The CollectiveRepositoryMBean defines basic CRUD operations which allow access to the collective repository. The ObjectName for this MBean is WebSphere:feature=collectiveController,type=CollectiveRepository,name=CollectiveRepository.

All of the data in the repository is accessible via these operations. Nodes can be discovered using the getChildren(nodeName, absolutePath) operation. This operation allows for recursive discovery of nodes within the repository. You can also use the dump operation to write all or a portion of the repository contents to either the collective controller server log or a file that you specify.

The RepositoryPathUtilityMBean can be used to conveniently URL encode a userDir and build repository paths. The ObjectName for this MBean is WebSphere:feature=collectiveController,type=RepositoryPathUtility,name=RepositoryPathUtility.

Can I write data to the collective repository?

Yes! However we strongly recommend that you avoid creating nodes with names that begin with ‘sys.’ as these names are considered reserved for use by the system. For example, the /sys.was.collectives/ path contains the published system data, and the sys.status node contains the running state of a server. Modifying data below a ‘sys.’ path is discouraged as the node tree may be removed by the system during normal operation.

When storing custom data in the repository, it is highly recommended that the data be stored under a unique name space (such as a Java package name or similar) to avoid conflicts with any names the system may use.

Would you like to know more?

The JavaDoc for the CollectiveRepositoryMBean and the RepositoryPathUtilityMBean is available in the Liberty installation directory:  wlp\dev\api\ibm\javadoc\com.ibm.websphere.appserver.api.collectiveController_1.0.1-javadoc.zip