Menu
|
IntroductionThis page describes the goals we are setting for the Higgins data model. This effort falls under "The need for interoperability" described here: Higgins Trust Framework Goals. At the higest level the goal of the data model is to provide a common representation for identity, profile and relationship data in order to provide interoperability. Information fragmentation is a pervasive problem. Even semmingly simple activities depend on information from a number of heterogeneous sources. The information may be fragmented by physical location, device, application, middleware, data storage or platform. By providing a common data model, data from multiple locations and systems can be unified. Of course there are other approaches to data unification than providing a common data model. However every unification strategy involves choosing some kind of lowest common denominator. It is all a question of how low is low. The lower the level, the easier to do the unification, but the more lossy. For example, consider raw text. It's easy to index, search, and copy/paste but very lossy. Or consider XML, which offers a common syntax for describing a series of attributes of a given object and values for each of the attributes, although still without any defined semantics. The kinds of data we wish to unify are very roughly classified as identity, profile and relationship data. Identity information is related to identification, authentication, etc. Profile information can be preferences, interests, and associated objects like events and things, wishlists. Relationships can be any kind of associations between objects (typically between Digital Subjects) as well as affiliations. Kinds of interoperabilitySaying we desire interoperability can mean many different things. At the least it should mean that we can navigate through and inspect data objects and their associated attributes/relationships within any Context through the Higgins API. This is part of what motivates [2], [3] and [4] below. At this level of interoperability we may not understand the meaning of the objects and the attributes, but we can know that they are there. Moving further along the interoperability spectrum, if we add the requirement that every attribute/relationship is globally uniquely identifiable (see [5]) then we can use the Higgins IdAS API for more than a shallow syntactic parse of the data in various Contexts. We can, for example, assemble (join) attribute information about about two Digital Subjects held in two separate Contexts, and perhaps implemented by separate providers, without collision and data loss. Along these lines, Higgins itself needs to implement certain kinds of cross-Context attribute data flows for correlated Digital Subjects. Beyond inspection and navigation, Higgins aspires to support applications that can also edit Context data. We envision Higgins-based applications with user interfaces that can manipulate data contained in any Context from any [[ContextProvider]] bound into Higgins. This implies a number of things. First, we require that the semantics of the attributes of objects be defined in a single well-defined (unambiguous) manner. If the model has more degrees of freedom than the absolute minimum necessary, ambiguity will arise where different [[ContextProvider Context Providers]] express the same semantic in different ways. For more about this see [6] below. Second, the specific schema of a Context's use of the abstract Higgins data model must be exposed at the CPI and API levels. This exposure allows an application to know what the valid degrees of freedom in the structure of the data are, and the values of its data fields may assume. The application can learn from the schema what datatypes are used to describe the value of a given attribute (e.g. a string, a non-zero number or a date, etc). It can learn what kinds of attributes may optionally be added to an object (and which may not), etc. And it can learn or what kinds of required and/or optional relationships are allowed with objects of various kinds. For more about the need for a common schema language see [7] below. The Higgins data model and Context ProvidersIn the Core Components the Higgins data model is exposed at the CPI interface. Context Providers are responsible for data transformation between the Higgins model and their own internal data structures. Higgins does not constrain the [[ContextProvider Context Provider's]] choice of data representation; it could be XMl-based, object-oriented, relational, or anything else. Here are some examples of some of the Context Providers envisioned:
For a more detailed list of envisioned providers see the provider section of: Wish-list. List of GoalsNote: in the following the terms attribute and relationship are used generically, not with any specialized meaning. We could have said field instead of either one. [1] The model is extensible; attributes/relationships can be added laterAny Context Provider can define their own data fields (attributes, relationships, etc.) with breaking existing parsers, APIs. To allow attributes/relationships to be added later, implies that all attributes are uniquely named (which implies [5] below). Since Higgins is extensible through Attribute Providers and the specific data models used by each are implementer-defined and thus open ended, the Higgins model must be extraordinarily abstract and fundamentally extensible. [2] All objects can be identified uniquely.Note: we have discussed achieving this by a combination of a context identifier (at the contain level) and within a context a Contextually Unique Id (CUID). [3] Objects have attributes and/or relationships with other objects. Attributes and/or relationships may be grouped into sets or sequences.[4] All objects and their attributes/relationships are addressable, navigable.Context objects and their associated attributes/relationships can be addressed using a simple, consistent indexing/navigation scheme. [5] Attributes/Relationships are identified by globally unique URIs.This enables the ability to assemble (join) attribute information about about two Digital Subjects held in separate contexts, and perhaps implemented by separate providers, without attribute collisions and/or data loss. Along these lines, Higgins internally needs to implement certain kinds of attribute data flows across contexts for correlated DigitalSubjects across contexts. This [5] in combination with [2] above means that an application using the Higgins API can inspect each attribute/relationship of any object. [6] There is a single, well-defined way to express the semantics of attributes/relationships.
Of course, any specific data schema (e.g. as implemented by a [[AttributeProvider]]) can and does choose to represent things as either attributes (e.g. member slots on an object instance), or relationships (e.g. pointers to other objects) or a combination of both. That is its prerogative. What's required is that there be a single canonical language to express the semantic intent behind the data structure. [7] Common schema descriptions. These schemas must describe the fine-grained contraints on the structure and values of data objects. The schema must describe the range of allowed values, cardinality, etc. for each attribute/relationship of an instance of a class, as well as allowed inter-object relationships including instances, classes and sub-classes.Any given object may be governed by any schema descriptor. And at the attribute/relationship level, schema descriptors can be used to govern certain aspects of data. i.e. a SSN may only hold one value, a surname may hold multiple values. These two levels are independent--an xyz://foo/bar/country attribute behaves the same whether it's held by a person object or a device object. Given access via a Context Provider to data described in the Higgins data model as well as access to the schema(s) used, an application can, without a priori knowledge, understand a data structure well enough to display, transform, search, filter and even perform some kinds of edits. [Note: whether the edit is allowable under the security policy of the Context and/or whether the update is ultimately rejected is another story, of course] [[ContextProvider Context Providers]] that implement a Context are responsible for returning the schema description in a data stream in response to a schema 'get' operation on that Context. The Higgins demo app demonstrates the need for a common schema description. In the app the ProfileShare Context Provider declares a simple "vCard"-like schema for its DigitalSubjects. This declarative, processible schema description enables the app to dynamically generate a fully functioning user interface to view and/or edit the vCard data without any prior knowledge of the underlying data structure, schema. No logic related to any specific class of data object is coded into the app. With this approach the app can manage and edit identity data within contexts that are dynamically bound into the framework. [7b] Contexts must declare the schema(s) used to define their use of the Higgins data model. Schemas must be composable (nestable).Since one Context may be fabricated from the conglomeration of disparate data sources or other Contexts, it must be able to use any number of individual or even nested schemas (schemas that include other schemas). The schema governing an object's data elements (attribute/relationships) is discoverable given that object and/or attribute's identifier. [7c] Context Providers may choose to support the ability to update one or more of its Context's schemaIt is anticipated that almost all Context Providers will choose not to support this functionality. Most provider implementations involve a complex intertwining of logic and data structure such that external updates to the schema are impossible to support. [8] Multiple Contexts.Object space is subdivided into separate Contexts. [9] Contexts are uniquely identifiable.Note: In practice we may have to qualify this by adding the words "...known to any particular instance of Higgins..." after the word "Contexts". Contexts may or may not be discoverable. [10] Contexts may be directly associated 1:1 or 1:M with other Contexts.These 1:1 or 1:M relationships represent direct relationships between Contexts (as opposed to implicit relationships between Contexts that are a side effect of relationships between objects across context boundaries as descibed in [11] below). Whether these Context-to-Context relationships are used hierarchically depends on the semantics of the consuming application and applicable policies. For example, should we characterize Higgins as a sub-Context of Eclipse in an organizational sense? If so, does this mean that all policies applicable to Eclipse are also applicable to Higgins? Does this apply to membership? Access lists? The answer is that the strict hierarchy probably doesn't apply to everything. Context relationships are a kind of relationships and thus according to [5] above are also uniquely identified by a URI. Some of these kinds of Context-to-Context relationships do involve hierarchy. For example, organizational structure or geographic containment (NC, UT, MA are states within the USA), and so on. The model allows for any number of hierarchies or graphs to be concurrently modeled. One could (potentially) have some access control policy applied to one hierarchy, and membership applied to another. [11] An object may have direct, unidirectional, 1:1 or 1:M associations with an object(s) in other Contexts. This is necessary to support [[DigitalSubject]] coorelation and aggregation. As an example, the same person Entity be represented as N [[DigitalSubject]]s in N different Contexts. The same Entity may be represented as yet another [[DigitalSubject]] in one final Context. The [[DigitalSubject]] in this final Context would have a set of references to the other N [[DigitalSubject]]s. This final [[DigitalSubject]] can act as an archetypal source of attributes for the other N, and Higgins may support attribute propagation along the directed reference links to "push" copies of attribute values to the N subordinate [[DigitalSubject]]s. Higgins may also support attribute flows in the reverse direction where the final [[DigitalSubject]] that is acting as parent to the N children can effectively inherit attribute values from its children. In addition to up and down attribute flow, there are use cases that involve side-to-side flow. For example, when Higgins mediates the opening of a target context from a base [[DigitalSubject]], it may search every DS reachable from the base [[DigitalSubject]] in search of the neccessary attributes/claims necessary to authenticate in accordance with the security policy of the target Context. [12] The schema description must be decidable.For a particular task a logic is decidable if it is possible to desgin an algorithm that will terminate in a finite number of steps (ie.g the algorithm is guaranteed no to run forever). This means, for example, that the the schema language must be chosen to not allow the construction of logical contradictions (that would cause automated reasoning to potentially infinite loop). Comparison to existing goalsThe following is a comparison to the "existing" goals here http://spwiki.editme.com/_Page?page=DataModelGoalsM4&page-version=26
Last Modified 8/1/06 12:21 PM |