Wednesday, August 24, 2011

The fallacy of constraining superset model

In HL7v3, though the overall development process is from RIM --> DMIM + CMET ---> RMIM,it not strictly enforced in the tooling. eg.after we developed DMIM for Lab domain, when we develop lab result specific RMIM, the RMIM is not strictly constrained from DMIM, rather than we use DMIM as base line, and then add or remove elements as appropriate or sometimes we directly develop RMIM from RIM. So it is not really a problem in HL7v3 though overall process is "design by constrains" but the actual content of the information model is not constrained from superset information model since RMIM is not strictly constrained from DMIM.

In other models such as openEHR and ISO13606 where it uses the archetype concept to develop re-usable data structure for all use cases, and then extend or constrain the archetypes for a specific use case requirement, there we will see the fallacy.

This idea of developing superset model is quite attractive at first glance, however in reality it becomes a burden and result in lot of issues and challenges at implementation, let me quote my original comment below to explain the fallacy.

Firstly, I found it is not useful at implementation level. For example the UV or even national level model defines 200 or more data elements with a lot of optional, yet the project specific requirement needs only 20 data elements, in this case the big model will likely confuse the implementer since he/she needs to fully understand what's the exact business use cases and relevance of all these other optional data elements and the potential impact to the project, and if the user does not really understand all these optional data elements, he/she will not be able to constrain the big model in the first place.

Secondly in order to address all possible requirements, the modeling process will be extremely long, in the end the modeler may just simply dump all the business data requirement he/she can think of without proper modeling rigorousness under time constraints or due to unclear use cases, and when use cases become clearer, the model needs refactoring. The model under this kind of development process, in software programming space, we call it "spaghetti code" - not sustainable code which makes the system extremely fragile and under constant refactoring whenever there is slight new or additional requirements. The other practical guidance is that why having the all the trouble to satisfy the 20 percent or even less of the needs at the sacrifice of the majority 80 percent or more needs? In the end those 20 percent or less needs is also not fully addressed.

Thirdly, from technical implement point of view, particularly web service implementation, the strategy for ensuring payload backward compatibility is to ensure the XML structure and data type expansion rather than contraction, e.g in the existing XML payload, we can define data type of one data element is "integer" since all existing systems are using integer, later when a new system requires it to be "string", you can safely expand the data type to "string" since type expansion won't break backward compatibility (for incoming request, not for outgoing response).Similar to data type, XML structure expansion is safer than contraction in ensuring backward with this technical reason and limitation, the modeling also should not try to be big fat one with loosely defined requirement, instead the model shall try to start with small and current known requirement, and evolve between different release.


  1. > the RMIM is not exactly directly constrained from DMIM

    yes it is. Name an exception. openEHR allows extensions, but v3 doesn't. You got it the wrong way around.

  2. To fully understand the fallacy, we need to spend some time to walk through some fundamental differences between HL7 RIM and openEHR, stay tuned for the next post about the differences.

  3. #Grahame

    In openEHR, when users need to define an archetype, first we instantiates from one of the Entry class such as Observation defined in its RM, and add data items with structure as Single, List, Tree or Table, which is essentially Observation class's "data" attribute of type - HISTORY. Or we can define Archetype with archetype slot, these archetypes can be used as the root archetype of the template.

    So once archetypes are defined, it is time to define Template which again is to assemble the relevant archetypes, specifying archetype slot fillers, and perform further constraining such as removing unused data items, and terminology binding etc.

    Is the above "extension"?

  4. No. all of it is constraint, just like it is in V3. There are distinct semantic and engineering differences, but it is still just constraint.

    Where openEHR is different is that you can derive from one archetype to another, and then *extend* that archetype by introducing other constraints on the reference model that are constrained out by the first archetype. V3 never allows this.

  5. openEHR specialized archetype can add further constraints on top of the parent archetype, and add additional data elements. But that does not really make much difference, it is essentially still constraining the generic structure such as List and Tree defined in the underlying RM - replacing generic name with more user friendly business name, tighter occurrence and terminology binding, this is similar to the whole development process in V3.

    I don't see which part of openEHR "extend" feature is not avail in v3. Of course openEHR does have much much more freedom in defining archetype compared with the way how RMIM is constrained from RIM in v3, but that has its own problems, I shall elaborate in greater detail in a separate post.