Data Integration vs Application Integration

Karol Skrzymowskis photo
Author
Karol Skrzymowski
Enterprise/Integration Architect
Published
3/10/2024
Last update
3/10/2024

The issue of “Integration”

Working in any industry brings all sorts of terms and abbreviations that are specific to that particular industry or even specific to a narrow field of it. During our work day we more often use job-specific jargon than we think, and that jargon when used outside of the industry context may be a cause of a lot of misunderstandings or even serious problems.

That is exactly the case with Data Integration and Application Integration. Both of these terms use the word “integration”, which is commonly used by specialists in those two fields alone without the preceding word pointing to their respective terms. Why is that a problem? People outside of that field mostly hear the word “integration” instead of the full term and because of that they tend to think that this is one and the same thing. To be honest even specialists in those fields make that mistake when they are not aware of the existence of the other field in IT. An application integration specialist might think: “Oh, data integration, yes, we move data from place to place, hence we integrate it! That is the same thing!”.

This way they create a little bit of chaos and misunderstanding, that may sometimes result in later organizational problems, like a lower budget, no willingness to invest in technology (because we already did invest so much in this area - the other area, because one was mistaken for the other), etc..

What is Data Integration?

So what actually is Data Integration and how does it differ from Application Integration? While researching the topic to give a complete answer, we stumbled upon a very descriptive analogy:

“Consider a room where different puzzle pieces are scattered all around, each with a picture on it. Now, what do you do if you want to see the complete picture? You bring all those pieces together, connect them, and complete the puzzle, right?”

Data integration is the process of combining and harmonizing data from multiple sources (multiple pieces from a puzzle) to provide a unified view (the complete puzzle). At least this is the short and simple definition. But what does that actually mean?

The focus of data integration is to make data, which is scattered among various systems, accessible, transformed, and deliverable in a consistent and meaningful manner across the whole organization for Business Intelligence (BI), data analysis, or use by other applications and business processes. This typically involves data cleansing, transformation and synchronization. Data integration is a crucial component in both traditional Data Warehouses and modern cloud-native solutions like Lakehouses, facilitating seamless access and utilization of data across the organization.

This means that the creators of data integration tools also provide certain capabilities to move data from sources and to consumers as this is often the requirement from customers who do not have their own application integration capabilities. This often involves setting up API connections, both hosting and calling, and directly accessing databases via ODBC or JDBC connections.

What is application integration then?

As we already defined in the article “What is Application Integration?” - it is the strive to connect different applications into a larger ecosystem that functions as one and by that we mean it is the capability to provide the right data to the right place at the right time, where Application Integration defines how that data transfer should happen within those given parameters.

Application integration tools, despite the main focus on connecting systems, are also capable of doing sophisticated data integration other than formatting or structuring. While these capabilities are usually very handy in solving specific problems, our advice is simple - please do not do data integration in the application integration layer! Putting business logic over application integration platforms is usually not a good idea. Unless you are working closely with a good data architect and you are capable of documenting all of the transformations properly with the documentation being easily searchable and accessible, this is usually a problem as Application Integration Platforms are not considered to be business systems. Unfortunately, when data integration is done on the application integration tools, it is usually done as a workaround to a problem that needs to be fixed later. As our experience shows, it is almost never fixed and remains in place, causing all sorts of maintenance issues, additional operational costs, further development problems, and serious production failures, with new releases that did not take this additional logic as a dependency to be tested.

The trade-offs

Taking the above as it is, there is an overlap between data integration and application integration. To a certain extent it is true, as a lot of the data integration tools provide application integration capabilities that enable the access and delivery of data. On the other hand application integration technologies often also have specific data integration capabilities available. In that way they contribute to additional confusion as to what is the scope of this field.

In the case of using data integration tools to facilitate application integration, as with all of the architecture questions, the answer to this problem would always be “it depends”. Several factors could be decisive in whether or not the tool should handle application integration:

  • Organization size - if the organization is small, most of the time there is no budget to build a robust enterprise architecture with fancy features and proper division of responsibilities. In those cases “done is better than perfect”, as implementing a highly sophisticated architecture may not contribute to company growth and would be too costly at this point
  • IT maturity - low maturity might contribute to additional complexity and make maintaining the IT landscape troublesome. In large environments, adding another component may seem harmless, but it will further contribute to overall complexity that is unstructured and hard to maintain, also called “spaghetti architecture” or a “big ball of mud”, making it stiff and costly to maintain. In those cases, a clear division of responsibilities would be beneficial
  • Preexisting application integration platforms and capabilities - if there are preexisting application integration tools available, this requires a validation, as too do they enable transport capabilities that are required by the data integration use cases (batch, on-demand, event-driven). If the capabilities are sufficient, they should be used, rather than the ones provided by the data integration tools
  • Use cases of the data integration tools - depending on the use cases, whether it is an operational data hub, a business intelligence tool, or ad hoc usage by business applications and processes, the capabilities may differ with applied tools, as also some easy transformations, like formatting or structuring, may be done over the application integration and will be sufficient

Architectural Oversight

As far as it’s true that every company has a different structure, roles and especially the names for them, it doesn't really matter in terms of the scope of responsibilities. These can be divided into the responsibilities of application integration and data integration, which in a perfect world would fall under the respective responsibilities of an integration architect and a data architect. While, like with tools, a company may have both, none or one or the other of the architectural roles, there are a few helpful guidelines to establish proper oversight. These can be summarized by asking four key questions: what, where, when and how.

What?

There needs to be a common understanding of data objects and how they are supposed to be used, like what are the edge cases for specific data points and rules for filtering them into specific scopes.

  • What data objects are we working with?
  • What is the business meaning of those data objects?
  • What is the context of their use in a business process?
  • What is the common meaning of these data objects in a specific business domain?
  • What is the scope of the data object actually consumed?

Where?

The data objects in a complex environment need to have a clear origin and there needs to be a clear understanding as to who (what system) is the consumer of these data objects.

  • Where is the data created and stored?
  • Where (what system) should the data be used?
  • Is there a data multi-mastery conflict?
  • Is the data split over several systems and needs to be merged?

When?

Knowing when along with what and where the data is used gives a full understanding of it in the business landscape, enabling the organization to provide proper data transports in a timely or even real-time manner based on actual needs.

  • When is the data used by the business?
  • Is the data freshness an important feature?
  • When should the data be moved and what triggers transport?
  • When does it add value to the respective business domain?

How?

This is the underlying technical process of transporting data between systems, which involves data transformation (simple formatting over the application integration layer, or heavy changes over a data integration tool), orchestration as well as enrichment. This is when tools, protocols and formats should be discussed.

  • Can the data be classified as an event?
  • Is it pulled on demand?
  • Should it be a batch transfer?
  • What is the orchestration and enrichment needed?

So what is the difference?

As stated, the difference between the fields of Data Integration and Application Integration can be easily distinguished at a logical level. The key problem lies in the fact that the tools used for both compensate in some way for the lack of capabilities of the other field. This makes it very difficult to clearly divide those capabilities in a live environment, as it requires a specific architectural skill set, as well as a high level of maturity. As the company and its IT landscape grows, it is worth understanding this logical division and onboarding proper tools for each field before the capability gap becomes a burden for the business.