博客中心-CharlieShen-Referencing Results of Model-to-Model Transformations

Referencing Results of Model-to-Model Transformations

Refining Transformation Results

An abstract, portable language cannot expose all features of all of its platforms to which it is mapped. In some cases this is accepted by the language’s user community. In other cases, particularly the connoisseur of the target platform will want to refine the results of mapping an abstract model to “their” target platform. Some languages don’t even intend to be complete in the sense of not requiring any lower-level refinement. For example, if a UML model is used only to specify packages, classes with their properties and operation signatures and the associations between these classes, the detailed specification of the operation implementations is still missing and needs to be provided in the language to which the UML classes are mapped.

Those approaches where refinement is desired or required can further be divided into two categories: those that require intrusive refinement of generated artifacts and those where refinements can non-intrusively be specified. Particularly the former pose some challenges to the tools involved but cannot always be avoided. Consider a refinement option in a descriptor file represented in XML. If the descriptor format does not allow the descriptor to be split across multiple files, intrusive refinement is the only option possible.

Most code generator frameworks provide reasonable support for such scenarios, e.g., by so-called user code areas or protected areas/regions. Trouble is brewing when refactorings happen in the model. The sections refined manually in the model transformation results need to be refactored as well. But the modeling tool usually has no knowledge about them. None of the aforementioned generator frameworks is capable of refactoring the refinements after model changes, such as calls to generated operations which were renamed in the model. The user ends up with less refactoring support than in a modern 3GL IDE where many powerful refactoring operations can be performed at ease.

The fundamental problem may best be exemplified with an code generator that only generates operation skeletons where developers have to fill in implementations manually, e.g., in protected areas. The hand-written code will typically have to make reference to other code elements generated from other model elements, such as data types, operations or attributes. While the generator will maintain the developer’s refinements upon re-generation, it will have a hard time adjusting the refinements to other refactorings that happened in the model. For example, if the developer calls another generated operation from within the refinement and the called operation gets renamed in the model, after re-generation the refined code will no longer compile.

This general issue also exists for model-to-model transformations and is difficult to fix. Only fine-grained model change logs and powerful refactoring frameworks can help but in this combination are usually not available.

A special case of such refactorings is the removal of a model element for which a refinement exists. It is not uncommon for code generator frameworks to delete the manual refinements upon mapping the model element deletion into the target environment. Good, if the user applied thorough version control mechanism should he/she later find out that some of the code in the refinement is still needed.

For model-to-model transformations, handling manual refinements in the target model is particularly complex. The difficulties start when the transformation writer needs to think about which refinements to permit and how to specify this in the transformation rules. Protected areas are not as easy to define for model element graphs as they can be specified for a sequence of ASCII characters. The OMG’s QVT standard (in particular its Core part) is trying to address this by letting the transformation writer define patterns for the target model which may also match after the user applied changes manually. The transformation can provide default values for those areas where users may later refine the transformation output. Broad adoption of QVT is as yet uncertain (see also [SwJBHH06]).

When a rigid process is in place for versioning, building and assembling the software, refinements incur another problem. They have to be applied to artifacts typically produced during a build step. However, most processes will not allow the intrusive manipulation of build results. It may not be possible to check in modified build results into the versioning repository. This may make refinements impossible.

Despite the challenges, refinements are frequently used because the target environment’s tools can be used to perform the changes. The more powerful, usable and convenient these are, the more difficult it will be to use marks or annotations instead (see Marks and Annotations).

Marks and Annotations

Instead of refining in the transformation output, the refinement information can be attached to the source models. The OMG’s MDA standard [MDA,MSUW04] calls such attachments marks. While the refactoring problem described in Refining Transformation Results does not go away, changing marks can be considered an operation on the source that will be mapped into the target by the transformation. The marks can be versioned together with the other source artifacts, and all changes to the target can be reproduced by the build process. Production and quality assurance people love this.

The problem with marks is that their editing can be painful at times. Since tools that support marks use this mostly as a generic way of parameterizing the generator framework, there is only generic editing and visualization support. Imagine you have to punch in C# code in a tagged value editor with no syntax highlighting and no code completion, just in order to keep the marks with the model.

Besides the decreasing development efficiency caused by mediocre tool support for specific marks “languages,” the developers’ unwillingness to use such tools may even be a larger obstacle.

Referencing Results of Model-to-Model Transformations

There are numerous model-to-model transformation frameworks available, many of them as open source, such as Tefkat, ATL, open Architecture Ware or the QVT by IKV++. When used in an automated build environment, these engines produce a new output model every time the build gets executed. Used in conjunction with repositories that identify model elements by unique IDs (UUIDs), this means that each new transformation run produces a new set of unique model elements.

If transformation results are referenced by other models and the references are based on the UUIDs of the elements referenced, these references will break with each new transformation run.

Additional measures are required to either allow for references to use alternative keys instead of or in addition to the UUIDs, or the repository needs to allow the model transformation framework to produce stable UUIDs for the model elements it produces. The Eclipse Modeling Framework (EMF) by default does not use UUIDs which alleviates this problem, at the cost of making other refactorings more expensive in case they change an element’s alternative key.

Prototypes implemented by SAP indicate that stable UUIDs produced by the transformation solve the problem; however, the long-term ramifications of such an approach yet have to be observed.

Handling Inconsistencies

Model repositories store model elements and the links between them. The links need to identify the elements that they connect. Different repositories and tools use different ways of identifying model elements, with different effects on the robustness and life cycle of the links.

Two fundamentally different approaches to model element identity management exist:

universally unique identifier (UUID)-based
key attribute-based

While its UUID remains stable across the life time of a model element, some repositories allow the key attributes to change their values. For example, the Web Tools Platform (WTP) built in Eclipse with EMF uses names for identifying model elements. Element references can break if elements change their name and referring elements are not covered by the refactoring. UUID-based references are not affected by such changes and from this perspective work better in a large-scale environment where owners of an artifact do not always know all of the artifact’s users or referrers.

However, UUIDs incur another problem. Section Referencing Results of Model-to-Model Transformations has already pointed out the difficulty of keeping UUIDs stable for outputs of model-to-model transformations. Beyond that, UUIDs can get “lost” if elements are accidentally deleted. It then depends on the tools and the capabilities of the repository how this case gets handled and what this means to references pointing to the UUID which now has disappeared.

A good repository needs to be able to store the broken reference because users may be able to reconstruct the element with the respective UUID, e.g., by fetching a previous version from the versioning repository. Rational Rose always did a great job at this: it marked the missing elements in the diagrams, kept the broken reference and waited for the user to come up with a version of the element missing. It would then resolve the reference again, and the model was healed. Several current tool and repository implementations still do not support this as usably and robustly.

It is also important that for UUID-based references the tools’ undo/redo functionality restores a deleted element under its original UUID.

Summarizing, one can say that the advantages that UUID-based links unfold in a large-scale setup come at a price that currently not all repository and tool implementations are willing to pay. Caution is therefore required in selecting the right infrastructure components for an enterprise modeling setup.

Models and Text-Based Syntaxes

Text syntaxes are one valid way to view and edit models, in particular one that many developers prefer over graphical editors. An expression in the Object Constraint Language (OCL) is an example of a set of model elements for which a textual syntax exists as the dominant form of viewing and editing such expressions. Other examples are syntaxes derived using the Human-Usable Text Notation (HUTN, [HUTN]). Smalltalk systems followed a similar paradigm. The development artifacts were objects maintained in the image, and the IDE was ultimately just manipulating these objects. This gives some great browsing and refactoring capabilities.

IBM tried to apply this paradigm to Java in their VisualAge toolset, based on the Envy repository, but issues around having to import/export the Java sources to use external tools operating on the source code proved to be stumbling blocks.

The boundaries between editing program text as ASCII files stored as such in the file system and editing a text view of a model repository are starting to blur. Eclipse’s Java Development Tools (JDT) provide excellent refactoring and navigation capabilities although for developers it seems that the sources are still stored as ASCII files in their typical folder structure. JDT manages this by maintaining all kinds of indices in the background. This approach comes very close to using a model repository with name-based identity and references (see also Handling Inconsistencies) and in addition preserves all benefits of regular text editors, such as keeping all lexical information the user entered (such as indentation and comments), the ability to save inconsistent or unparsable texts, or copying and pasting arbitrary sections of text and not necessarily only valid subtrees of the concrete syntax tree. A similar approach is pursued by TEF [Sch07] and openArchitectureWare’s Xtext [Xtext]..

Problems occur when trying to put a parser-based approach on top of a UUID-based repository. The parser cannot easily identify changes, particularly if elements changed their names in the text. Elements with new UUIDs may get created, and old references will therefore break.

At the far other end of the spectrum are approaches like that of Intentional Software [Sim07]. There, the tools can combine a variety of different syntaxes, among them text-based ones, even in a single editor. Modifications are applied directly to the underlying model and affect all other views immediately. Intelligent parsing technology ensures that syntactically incorrect stretches of text can still be saved and that editing the model feels like editing a text document. With such an approach it becomes possible to combine text-based syntaxes with repositories that use UUIDs.

However, these capabilities are not yet widely available, and in the tools market, customers are reluctant to get locked into proprietary solutions. As a result, the powerful combination of graphical and forms-based syntaxes with text views has not reached widespread adoption at the time of writing.

Model-Level Debugging

When behavioral aspects of an application are expressed in models, and those models are transformed into artifacts for a runtime platform, attention must be paid to how these applications are debugged. This particularly becomes an issue when the teams get larger, and those who implement and maintain the runtime platform are different from those who model the applications. In this case, the application modelers have no intimate knowledge of the runtime, nor do they know how the models get translated into something that the runtime can execute. Therefore, if modeling environments do not support debugging in the (behavioral) models then this is as bad as having to debug assembly code that was generated by a C++ compiler.

A model transformation can keep a traceability record of which elements of the model it mapped to which elements in the runtime. This information can then be used during a debug run to suspend, introspect and continue the application execution based on locations in the model.

As a special case, a traceability record can be embedded into the generated runtime artifact. This is the typically how compilers handle this. The runtime then needs to be aware of and understand this instrumentation and can use it during debugging to break and introspect based on source code concepts rather than runtime concepts. This approach is therefore only viable if the runtime and debugging environment can be enhanced accordingly.

Still, many code generation frameworks and the generators built with them ignore the issue of model debugging. It is the transformation writer’s responsibility to generate the respective traceability information. Some implementations of the OMG Query, View and Transformations (QVT, [QVT]) standard automatically produce traceability records during transformation execution which is a first step towards model debuggability. Even specific tools such as UML tools with code generation capabilities do mostly not provide support for debugging, e.g., sequence or activity diagrams that were mapped to source code in a 3GL such as Java or C#.

Repository Scalability

It is a prerequisite for enterprise deployment of a model-driven development approach that the underlying repository scales to the number of users and the number of model elements to be handled. There are mature commercial products in the market in this regard. Open source offerings such as EMF or formerly Sun’s Metadata Repository (MDR) which is still part of NetBeans but no longer maintained, lack several architectural qualities that are required in a large-scale deployment.

On the other hand, open source repository implementations gather substantial communities around them. EMF is the second most-frequent download from the entire Eclipse web site. This bears significant innovation and availability of useful components based on the respective respository technology which is typically not available around proprietary repository implementations. Ironically, even for standard-compliant repository implementations such as those following the Meta Object Facility (MOF) standard [MOF], adoption and value-add on top is miniscule as, e.g., compared to the de-facto standard EMF.

This puts enterprises in front of a delicate decision: go with a de-jure standard such as MOF, buy or build a scalable implementation but be unable to benefit from important innovations contributed by the communities; or get the on-top innovations but compromise on the scalability of the underlying repository solution.

As of this writing, an industrial consortium with companies such as IBM, SAP, Tata, Siemens, Bosch and IKV++ is forming that is working on identifying the particular points for improvement of EMF to make it usable in enterprise-scale environments.

Language Governance

Beyond the technical, infrastructural challenges in modeling, another risk is the proliferation of proprietary modeling languages. As tool construction frameworks and meta-case tools make it easier to define languages and implement the corresponding tools, even individual projects with their specific set of frameworks tend to exploit this and apply DSLs and model-driven approaches in the project-specific context.

Consequently, there will be many more, highly specialized languages. It cannot be expected that for these a community and market exists that is by any means comparable to the communities and markets for widely-adopted, mostly general-purpose languages. Projects should keep this in mind when trading off the benefits of a custom-made DSL with the benefits of large language communities.

Furthermore, as enterprises run many IT projects, several of them concurrently, it is useful to install a governance body for the DSL design. This helps to avoid severe problems when artifacts from different projects flow together. It also eases the transfer of project members from one project to the next because the proprietary languages used in the enterprise at least share a common set of principles and guidelines. Governance thus avoids an unmangeable “language zoo.”

Evaluation and Hints

Model-driven software development is a logical evolution of classical programming language design and compiler construction. It continues the pursuit of abstraction levels adequate for the problem domain and runtime platforms. Particularly the use of graphical notations and partial views, but also the use of text-based syntaxes in combination with model repositories incur a number of serious new challenges unknown in purely text-represented programming language environments.

Each of these challenges has been solved by the one or the other research project, open source component or commercial products. Each of those that I know, however, also leave several issued unresolved. Some issues can be worked around by carefully designing a specific approach to model-driven development, avoiding pitfalls imposed by current shortcomings in infrastructure technologies. However, some challenges lurk to strike you. Here is how to tackle them, in brief:

Prefer marks over refinement if editing the marks can be made usable.
When needing refinements, try to keep them non-intrusive.
Keep debugging in mind when transforming behavioral models.
Allow referencing generated models only if you can guarantee reference stability.
Avoid diff/merge where possible by partitioning wisely and flexibly, and consider pessimistic locking.
Use UUIDs where possible; augment with alternative keys for stability where needed.
When integrating text, decide what gets versioned and how and decide between “Eclipse/JDT” and “Intentional” approach.
Choose a repository that scales to your needs.
Do not expect round-tripping to work other than for trivial mappings with no change in abstraction level.

Before the benefits of what we intuitively understand from modeling can fully be harnessed in an enterprise software landscape, several gaps need to be closed by integration projects that rarely turn out to be trivial. Modeling technologies that keep maturing and the striving for new and the application of existing research results will continue to close the existing gaps.

分享按钮发布于： 2007-09-23 01:58 CharlieShen 阅读(64) 评论(0) 编辑收藏

CharlieShen

留言簿(14)

随笔档案

文章档案

搜索

最新评论

阅读排行榜

评论排行榜

Referencing Results of Model-to-Model Transformations

Refining Transformation Results

Marks and Annotations

Referencing Results of Model-to-Model Transformations

Handling Inconsistencies

Models and Text-Based Syntaxes

Model-Level Debugging

Repository Scalability

Language Governance

Evaluation and Hints