Understanding the flow of information through a solution architecture is critical for understanding any architecture. An Information Flow diagram is a simple approach for depicting the data movement aspect of any architecture.
Capturing the flow of information is a popular and effective approach for understanding a solution architecture, so various notations have been used across the years. Before I dig into our favorite, I present to you a rogue's gallery of information flows!
- Data Flow Diagram (DFD). The DFD notation draws on graph theory, originally used in operational research to model workflow in organizations. DFD originated from the Activity Diagram used in the structured analysis and design technique methodology in the late 1970s. It is one of the more venerable notations used for information flows. #history It combines data flows (inputs/outputs) with the processing done on the data. As mentioned above, I prefer to separate concerns, so I use an Activity Diagram to capture processes and keep the Information Flow scoped to, well, the information flow.
- Information Flow Diagrams (IFD). Njissen Information Analysis Method, first developed in 1975 and described in “The NIAM Information Analysis Method: Theory and Practice (2012),” introduced the IFD diagram as one that shows how information is communicated (or “flows”) from a source to a receiver or target through some medium. The initial “medium” was radio, but the conceptual model remains relevant for a broad spectrum of sources, receivers, targets, and mediums!
- IDEF0. This functional modeling notation for manufacturing, introduced in 1981, includes the flow of information as a core construct in the notation. It is a well designed notation but overloads many perspectives (functions, data, objects, controls, and more) into a single diagram. I think separating concerns into fit-for-purpose views results in diagrams that are easier to create correctly and simpler to read.
- UML Information Flow. The Information Flow is not its own UML diagram type but a construct that can be used on many UML diagrams to show the flow of information between two objects. Spoiler alert: since we use UML for most of our recommended architecture views, we typically use a simplified version of this notation to capture an information flow.
- Whatever. Capturing the flow of information is not rocket science, and often a “go-to” when anyone is thinking through an architecture, so we tend to see many different formats for these “in the wild” but I highly recommend standardizing a notation for these!
Information Flow Diagram Notation
As mentioned above, we use a notation based on UML to capture information flows, but it is a very basic use of “boxes and lines.”
You label each box with the name of a component, show the flow of information using a dotted line with an arrow, and label the information near the arrow. Seem simple? Perfect!
Information Flow Diagram Anatomy
Each rectangular box is called a component. In the UML notation, a flow can be identified between any two items (components, classes, objects, nodes, etc.), but we keep things simple and focus on architecturally significant components. In this case, we use it to identify any places in the architecture where information “rests.” It may be the system of record for the information or a temporary storage location. See the caveats below about information hubs!
A dotted line with an arrow indicates the direction of information flow between any two components.
The label on the line provides a high-level description of the information that is flowing. In formal UML, the line would also have a “<<flow>>” stereotype on the label. Since we are *only* showing flows on the diagram, we leave that off to keep it simple. The label should not describe technical details of the information transfer (we have other diagrams for that), but the nature of the information, e.g. “accounts.”
Adding Additional Information
These are all methods to add additional information to an information flow diagram, but use all of these sparingly. Less is always more when diagramming – anything on the page that does not contribute to communicating your message distracts from the message.
You can use a UML Note during drafting to capture open questions and when publishing to provide clarifications. You should use sparingly on completed diagrams: too many notes with additional information often is a sign that you might be trying to communicate something that you could depict better using a different notation.
You can use Bounding boxes, swimlanes (tiers), or even placement on the page to add an additional dimension to the components. For example, a subset of components could be within a bounding box labeled “partner systems.” Remember to draw bounding boxes with dotted borders, to differentiate them from nested components.
Nested components can come in handy when some of the information flows from a component (like a data warehouse) and other information flows from a nested component *within* the other component (like a reference master), and it is important to your architecture to highlight that fact. Both this and the prior “bounding boxes,” are, in essence, boxes inside boxes with two primary differences: (1) you depict bounding boxes with a dotted border and nested components with a solid line, and (2) arrows cannot be connected to bounding boxes.
Parentheticals on labels are sometimes helpful for calling out differences. A batch information flow diagram can have a note at the top that reads “all transfers are daily unless otherwise noted.” You then label specific flows with “monthly” or “hourly” in parenthesis after the description of the flow.
While you can use fonts, colors, and line styles to add additional information to the diagram, be careful. It is a very short trip from additional helpful information to an eyesore peppered with distractions.
We keep this diagram pretty simple by design, so don't have too many advanced maneuvers, but…
If you are depicting a sequence of information flows where the order of the flows is essential, you can number them. However, needing to do this is usually a good indicator that you would be better off creating a UML sequence diagram.
If you have a lot of flows in both directions between systems, you can connect them with a dotted line and then include each flow's direction on the label. This is the style of most of the information flows we create!
The Challenge of Information Hubs
By “information hub,” I refer to any component that acts as a nexus for the information – e.g., a data hub, an ETL hub, service hub, or file management hub. It serves as a “FedEx” shipping hub of information: all the information flows through it, in from sources and out to destinations. The problem is that by including it on the diagram, you lose sight of where the information is actually flowing because it all flows in and out of the hub.
Example Omitting a Hub
Example Including a Hub
By including the information hub, we hide where the data is actually flowing. The example shows an ETL hub, but the same holds for API gateways, Enterprise Service Buses, and File Management hubs. The issue becomes worse as you add more systems.
Handling Information Hubs on Information Flow Diagrams
There are some strategies for handling information hubs.
1. Leave them off!
If it is not material to the information flow, leave it off. If an organization always uses an integration hub to move data, then there is no clarity lost by leaving it off. If needed, add as a “global” note to the diagram (near the title), e.g., “all data moved using enterprise ETL hub.”
2. Use a bounding box
Add a bounding box around components using each information hub and label accordingly.
3. Use Notes
The only time it is appropriate to include an information hub as a component on an information flow is if you are doing a very abstracted view of the architecture and all the components are abstractions. This usually only occurs if you create an abstract view of the entire enterprise's data flow or diagram the hub architecture itself.
Some Information Flow Diagram Guidelines
Here are some tips to get you on your way. I am going to skip general diagramming best practices and focus on some specific to the information flow.
- Focus on the end objective of the flow. Many data exchanges include some back and forth “conversation” as part of the information flow. Ignore the chatter and only model the primary objective of the interaction. If that includes information traveling in both directions, then depict each a a seperate flow.
- Describe flow descriptions on the labels using language that is understandable by a wide audience.
- Keep the flow descriptions as high level as possible. Only get specific when the architecture demands it. An example of this might be a key field that must be present or else the architecture fails. In this case, you can still keep the rest high level, e.g., “customer info incl taxid.”
- Make the flow descriptions mutually exclusive using adjectives. For example, don't have “customers” flowing from two different systems, but use adjectives to differentiate them: “retail customers” and “business customers.”
- Depict batch/bulk information flows separately from online/transactional. The information flow is much more practical for batch/bulk information flows; we often pivot to UML sequence diagrams for online/transactional flows.
What can I do with Information Flow Diagrams?
So much! They are the best solution architecture tool in an architect's toolbox, next to the solution user diagram, because most people intuitively understand a diagram showing information flowing between systems. Since we keep the solution user diagram firmly focused on the people, the information flow is the solution user diagram's sibling, focused on the systems! Some typical uses are:
Enterprise Information Flows. It is excellent for enterprise-level views for the information lifeblood of your organization (although those with tons of systems can be a real challenge to create).
Integration Architectures. “Black-boxing” a technology solution in the middle of the page, then capturing all the information flows in and out, is a great way to inventory all the integrations for a new (or existing) solution. Not only do we use these for solution architecture design, but often as a quick and visual way to inventory all the required integrations during requirement facilitation.
They really can be used effectively to understand architecture at almost any level of abstraction. However, the lower you get, the more likely it is that a UML sequence diagram would bring more value.
Go with the Flow!
There you have it. Let the information flow diagrams begin! You will find them highly effective for designing, communicating, and untangling architectures of all shapes and sizes. If you'd like any help with any of that (we love the untangling) or training you or your team to use visual design more effectively, drop us a line!