Reconsidering Data

Master data management has firmly embedded itself into the collective leadership psyche. Well, the terminology has. But whether decision-makers really understand what they should be managing is another question entirely.

I spend a lot of time in data meetings, usually involving government organizations. The majority of the time is spent worrying about “data tagging.” The discussions focus on what should be tagged, which tags to use, etc. Some of these discussions have been going on for fifteen years or more. I’ve seen the same argument over the same issue run on for over five years with no decision. That’s just a waste of time; there’s no other word for it.

The problem is that we’re trying to manage the wrong things, and we’re doing that because of some unexamined assumptions. Understanding that, and thinking clearly about possible solutions, will require understanding a little bit about how we got here.

In 1994, the Dublin Core Metadata Initiative (DCMI) was established to define metadata standards for documents on the World Wide Web. In a nutshell, their goal was to define an XML version of a library card catalog so that I could be electronically cataloged, simplifying search and retrieval of documents.

Dublin Core was a good idea for its time. But its time has passed. Technology has passed it by, but too many data managers have not noticed. Think about it: library catalogs date as far back as 700 BC, and the modern card catalog as we used to know it dates to 1780.  Works were indexed by subject, author, and title. This made sense in the days when it was impossible to index the entirety of each work.

Modern technology has made the old-fashioned card catalog obsolete, replaced by an electronic catalog. By the same token, the catalog structure itself is obsolete due to the march of technology. The card catalog was devised because it was impractical to create detailed indices of an entire work. The solution was to mine out specific search terms; title, subject, author, and important keywords in the text. Even into the early 2000s this made sense; scanning and indexing an entire work was impractical for large bodies of work. That is no longer the case. A document of a hundred pages can be fully indexed in less time than it takes to write this sentence. for documents that are reasonably well-structured, it is even feasible to identify the author and title.

And yet we persist in managing document metadata as if we were still beholden to a physical card catalog. Why?

It would be better to toss out, or drastically strip down, standards like Dublin Core and replace them with a more comprehensive approach that assumes the entire contents of a work will be fully indexed and electronically searchable (which is the current reality, whether we want to recognize it or not). Continuing to treat document metadata as something unique is the source of endless problems.

In reality, digital data only comes in one of two forms: as a binary object (a picture, a Word document, etc.) or as tuples. All the text inside a document can be treated as one or more tuples for purposes of managing data. Even if it is stored as a unit (i.e., a file containing text), tuples that describe that body of text can be easily mined out of it.

Instead of managing a card catalog standard, we should be focused on managing tuples.

 

 

 

Architecture is Ephemeral

Before I begin, I must confess that this is not my original idea. That distinction goes to Gerhard Beck, a friend and colleague whose disregard of pieties in the enterprise architecture field has provoked a great deal of thought and argument (in the best sense of that word).

The 19th century military theorist Carl von Clausewitz asserted that no plan survives first contact with the enemy. By the same token, no architecture survives first contact with the developers. Events intervene–requirements change, components do not work as anticipated, new capabilities come to market–all these things collude to make the as-built system vary from the as-designed system to one degree or another. And that does not even take into account how rarely developers actually consult the architecture while constructing a system. And no one goes back to update the architecture document to reflect the as-built system. That kind of update costs money and is not seen as a value-added activity by program managers who are trying to keep costs and schedules under control.

So systems never reflect the architecture. The moment development begins, the architecture is a historical artifact. The only architecture that matters is the one on the network. And that one is not static; it is always changing.

Traditional architecture frameworks do not recognize this fundamental fact. All traditional architecture frameworks derive from the original Zachman framework. Conceived in a time when system development projects were major, expensive undertakings, the Zachman framework does not suit all modern architecture needs as well as it once did. It’s a fine framework for new-start systems, but it just is not responsive enough to rapidly changing technology and business needs for the modern IT ecosystem. In an age of Agile and DevOps, traditional architecture frameworks are at a disadvantage.

So, if traditional architecture frameworks are not helpful in understanding the operation of a modern, dynamic enterprise, what is?

Most large enterprises have deployed host-based monitoring agents as part of their cybersecurity strategy. These agents gather a wealth of important architectural data and forward it to a central server. Where it is devoutly ignored by the cybersecurity experts who are trying to protect the enterprise. Do not get me wrong–that is not a criticism of the cyberwarriors out there. I am merely pointing out that they have other fish to fry. There is a wealth of data about the real. operational architecture that can be reused to understand what is really happening in the enterprise. Understanding that ecosystem is where enterprise architects need to focus their efforts.