Loosely Coupled - Data Mesh vs Application Integration
Karol:
Good morning, good afternoon, and good evening, everybody, and welcome to another episode from Loosely Coupled, brought to you by Bridging the Gap.
Karol:
My name is Karol Skrzymowski, and I'll be your host tonight.
Karol:
Today, we have an interesting topic, at least from my perspective, because a topic that I don't know quite a lot about.
Karol:
I have just hints, not my space of engineering.
Karol:
Today, we're talking about data match versus application integration, which is my topic of interviewing.
Karol:
But this is a very interesting topic that I stumbled into this year by attending Domain-Driven Design Europe, which is a very lovely conference held this year in Antwerp in Belgium.
Karol:
I don't do that much data in terms of working with data.
Karol:
I just pass it along as any integration engineer.
Karol:
And this is, at times, a bit of a problem, because what I found doing a workshop this year at BBD Europe was that we do have a bit of a divide between the world of data and the world of application integration, especially, I think, in the language.
Karol:
While the title is a bit of a pun or a hook to lure you in, it's basically, I don't think it's versus.
Karol:
But, that said, I'm joined today by two wonderful architects, engineers, starting with Rachel Barton, who is a principal and domain architect with 20 years of experience in working with data.
Karol:
Quite amazing, right?
Karol:
Now, Rachel worked both on the reporting and operational side of data, which means that this is basically two sides of the same coin.
Karol:
I mostly work on the operational side, really.
Karol:
So, for me, that reporting part is not something I'm familiar with.
Karol:
So, I'm very keen to hear what's going on there.
Karol:
And Rachel very early jumped on the bandwagon of data meshes.
Karol:
So, I'll be very curious to hear her experiences on the topic.
Karol:
Also joining us today is Andrew Jones, a principal engineer and an author with 15 years of experience in the industry, first working as a software engineer, then becoming a data platform data engineering expert.
Karol:
And since 2023, Andrew is also the author of a very, very lovely book, Driving Data Quality with Data Contracts.
Karol:
I haven't read it yet.
Karol:
Sorry, Andrew.
Karol:
But I hear it's a very good read.
Karol:
We'll be going back to the book later this stream.
Karol:
So, without further ado, Rachel, Andrew, welcome to the Loosely Coupled stream.
Rachel:
Thank you very much.
Rachel:
Good afternoon to you.
Karol:
Thank you.
Karol:
Good afternoon.
Karol:
How are you guys doing today?
Karol:
How's the weather in the UK?
Rachel:
It's beautiful and sunny up here in Scotland.
Rachel:
How is it down south, Andrew?
Andrew:
Been less sunny, but dry, which is the main thing, because it's the summer holidays, so it's never perfect weather.
Andrew:
As long as it's dry, we can do something with kids outside.
Rachel:
Always perfect in Scotland.
Andrew:
Of course it is.
Rachel:
You hear otherwise propaganda.
Karol:
I guess nothing like two people from UK talking about the weather, right?
Rachel:
You want to move on to something more exciting?
Rachel:
We can entertain you with data mesh, if you like.
Karol:
Yeah, definitely.
Karol:
More exciting than the weather, always happy to, especially if that's a tech-related topic.
Karol:
And today, we're having a tech-related topic.
Karol:
On the contrary to the two other streams that we were having within Loosely Coupled that were touching upon recruitment or cognitive load, a little bit more on psychology there.
Karol:
But yes, data meshes today.
Karol:
And that's a topic that I'm not very familiar with.
Karol:
I mean, Rachel, we had our conversation about that.
Karol:
I think, personally, there's a little bit of butt blood between the data architects and integration architects, and also a little bit of, well, not a little bit, quite a few of misunderstandings in terms of language and conversations.
Karol:
So, I think today, we'll be clearing those up a little, hopefully.
Rachel:
Absolutely, yes.
Karol:
Unless we get into a fight.
Rachel:
There's no fighting.
Rachel:
Data is all about peace, love, and collaboration.
Karol:
Ah, perfect.
Karol:
Okay.
Karol:
That's what I like to hear.
Karol:
All right.
Karol:
So, shall we?
Rachel:
Do you want me to jump in?
Karol:
Yeah, go ahead.
Karol:
If you can define data mesh for us, just like waltz in.
Karol:
Why not?
Rachel:
So, I'll kind of start off in terms of data mesh, kind of the message that we were hearing and why we adopted it early on from Jamac was the bringing together of the operational and the reporting analytic data into the same domains so that they can be presided over by the business SMEs who understand that domain.
Rachel:
So, as opposed to my past, when I worked on the reporting team, you would get the operational teams bringing on all that lovely transactional data.
Rachel:
Great, job's done.
Rachel:
A month later, oh, we need to get that into a report.
Rachel:
Oh, well, hang on a second.
Rachel:
It's not in the right format.
Rachel:
We don't know what the quality of it or the rest of it is.
Rachel:
So, this is the architectural paradigm that brings that together and which allows us to then have this much more constructive conversation about the operational side, how application integration becomes really important for moving the data around as well.
Rachel:
I'll pass over to Andrew if there's anything else you want to add in from your side.
Andrew:
No, I think that's a really great introduction to data mesh, a great summary of what we're trying to do here.
Andrew:
I guess the reason why we're trying to do that is because the way we're using data in the analytical plane, the analytical domains area, is because we're using a lot more important things than we were before.
Andrew:
You could have argued that it's just reporting, it's not very important.
Andrew:
I don't think that's ever a fair argument, but some people might have argued that.
Andrew:
But these days, reporting is critical for business, for decision making, for revenue reporting, for regulations, meta-regulations, for various other operational processes in the company.
Andrew:
And then, of course, people are using data, feedback they're getting from analytics back into the operational plane for certain features.
Andrew:
I'm trying to avoid mentioning AI, but AI is obviously part of that as well.
Andrew:
So I think the growing importance of this analytical data is the reason why we are trying to treat it a bit more like we do operational data.
Andrew:
We're trying to bridge the gap, I suppose.
Karol:
Yeah, good one there.
Karol:
What I've seen so far in terms of how people approach reporting is that quite often it becomes more real-time than it used to be.
Karol:
People have still this notion about reporting being done quarterly, once a month, yearly, something like that, right?
Karol:
Well, I'm starting to see more and more trends to have this reporting done real-time, to have those data-driven decisions enabled in real-time rather than post-factum.
Karol:
And I think that's something that some businesses are already going into really hard.
Karol:
Others are kind of trying to catch up.
Karol:
And then there's the public sector, which is something else entirely.
Karol:
So it's interesting from that perspective that what I understand from what you said, Rachel, is basically data mesh would be the area of expertise to look at data from more of a holistic perspective and more of a productized perspective of how we can use data instead of just putting it into labels and boxes like operational data, analytical data or reporting data.
Karol:
It's data.
Rachel:
Well, yeah, it's picking up on that product term.
Rachel:
So where product was probably slightly being picked up anyway around the system side, it's the same thing.
Rachel:
It's bringing together that product thinking and that design thinking.
Rachel:
So you've got the product you're building is for a consumer, and therefore you're focussing on that consumer's needs, whether it is for a BI report that needs to be produced daily, monthly, whether it is the operational side where you need it a little bit more quickly, or if you are looking at the analytics and what you're setting up for your AI models as well.
Rachel:
So the product side of it, making sure that we're building for consumer needs and then the design thinking to make sure that it is actually usable by people is one of the key elements of that.
Karol:
OK, so looking products, what then is a data product?
Rachel:
It's a beautiful question.
Rachel:
It's where the probably the biggest interpretation or evolution of understanding has been.
Rachel:
So it'll be interesting to see if my answer would be the same as Andrew's on this.
Rachel:
So the data product is technology agnostic, right, to a certain extent, because what you're looking to do is fulfil the consumer requirement, have that product that a consumer can pick up and take off the shelf.
Rachel:
So it can come in a number of formats.
Rachel:
It can be a Power BI report, but not all Power BI reports are a data product.
Rachel:
It depends on how widely used that is.
Rachel:
It could be a collection of information available through Snowflake or it could be a machine learning model, etc.
Rachel:
Again, technology agnostic.
Rachel:
But the important part around it is that it's not just the data.
Rachel:
It is everything that is required to encapsulate and provide that data in the way the consumer needs it.
Rachel:
So you're talking about the data, the codes that is required to transform it into the format that it's required, any governance policy around it, and also the metadata.
Rachel:
So it's more than what you would normally be asking somebody to develop and produce relating to data.
Rachel:
The reason for that is because of the characteristics of a data product.
Rachel:
There are a number of them.
Rachel:
One of the most important parts of that is the discoverability.
Rachel:
If you produce something, it's being built to fulfil a value proposition and you want people to be able to find it and use it.
Rachel:
That's really what the metadata supports.
Rachel:
It's also obviously really important in order to support AI and the use of data products around that.
Rachel:
Andrew, would you agree or is there anything else that you'd add?
Andrew:
No, once again, I definitely agree with all of that.
Andrew:
I think exactly what the product is in your organisation is up to you to decide.
Andrew:
If you decide there's going to be tables in the database or data warehouse, you decide it's going to be reporting.
Andrew:
People argue about that all the time.
Andrew:
I mean, it doesn't really matter as long as you've got a clear definition inside your organisation.
Andrew:
But I think the most important thing really is applying that product mindset to whatever it is you're producing.
Andrew:
And it's not new to data.
Andrew:
People are applying product mindset to internal tooling, for example, for quite a long time now with platform engineering and the like.
Andrew:
And obviously to the actual product features that you're building in your technology company.
Andrew:
So this idea of product mindset is not new, it's not unique to data, but it's been relatively more recently been applied to data largely because of data mesh and it being one of the core parts of data mesh.
Rachel:
Yeah, I'd also say that it's also the differing roles in data.
Rachel:
It's no longer just developers who are building them.
Rachel:
So where the product mindset isn't new in relation to data, it might be new to the new roles that are coming in and the people that are now looking to build them.
Rachel:
And that's an important part to think about in terms of when you're implementing this as an architect and thinking about the training and the support that the people in the business need to understand it and adopt it.
Karol:
So what kind of roles are those then?
Karol:
If we're putting aside software engineers who are building things, what are the roles that are related to the data mesh topic?
Rachel:
Yeah, relating to data mesh and kind of broader in terms of those data analysts, those data scientists, but also you've got it in relation to systems as well.
Rachel:
If you think about what we would have called shadow IT in the past, and we're now calling citizen developers, right?
Rachel:
You've got the system developers of data.
Rachel:
So again, it's a similar concept, but the extra challenge you have on the data side, particularly if you're working in a regulated industry, is around the governance of the data that goes with that.
Rachel:
So that's where one of the data mesh concepts of the automated federated governance is really important.
Karol:
Okay.
Karol:
So this brings to my mind a specific question, because I'm an integrator.
Karol:
For me, I'm moving things, not looking at things that much.
Karol:
Of course, we're looking at things for data quality purposes, vetting if the requests, responses, and events are properly formatted and whatnot, but this is not to the extent that data engineers work with data.
Karol:
But what I learned over my career is, well, I learned about coupling.
Karol:
That's, I think, the most basic term we're talking about in anything related to any contracts, anything.
Karol:
But the one thing that is a very wanted type of coupling is the semantic coupling, which basically brings us to the understanding of said data.
Karol:
And then the question is, how is data mesh then addressing that semantic aspect of the data product?
Karol:
Is it even?
Rachel:
Andrew, do you want to go first?
Rachel:
I feel this kind of leads into you nicely.
Andrew:
Yeah.
Andrew:
I think data mesh isn't really prescribing exactly how you do that, but I think you do need an approach to how you're going to have a semantic view of data, and that makes sense to different consumers.
Andrew:
Because I think with data mesh, you're leaning more into a centralised architecture.
Andrew:
So therefore, the data producers might be different in different parts of the organisation, different teams, different tooling, and you're supporting them through things like the platform you're providing, through automated governance, where you can mostly produce data, but also become experts in regulation and data management.
Andrew:
So you're supporting them how to do that, but you still have a problem of what does data actually mean?
Andrew:
How can I join it together later?
Andrew:
And that there are various ways you could solve that.
Andrew:
I don't think there's a perfect way yet.
Andrew:
But you could, what we try and do is we try to ensure that alongside the data, there's metadata that describes the data.
Andrew:
And so at least that data then becomes usable and consumable by anyone as long as they can understand that metadata.
Andrew:
And if you can standardise metadata across different audit products, then any data product consumed, you know it's got associated metadata around what data is it?
Andrew:
How is it described?
Andrew:
Who's the owner?
Andrew:
How can I use it?
Andrew:
What are the limitations around it?
Andrew:
What are the attention policy for it?
Andrew:
All those kind of things.
Andrew:
We'll define a standard way so that as a user of data, I can consume it and I can understand how to use it.
Andrew:
But also at that platform level, you can start doing automated governance that Rachel was talking about and all of that.
Andrew:
So this really feeds into metadata that describes the data.
Andrew:
And we capture that in a contract.
Karol:
So that basically feeds into the topic of data lineage and data traceability to understand where that data comes from, what context of creation of said data is and then work on that in terms of the semantic understanding of it.
Andrew:
To a degree, I mean, data lineage is basically you're creating that graph or an image graph of where data is moving.
Andrew:
So it could be like a dependency graph that you might have in software systems.
Andrew:
And that might come from metadata.
Andrew:
You can draw that.
Andrew:
But you might also generate it from looking at like audit logs and other self-tracing capabilities you might have in your applications and your data applications.
Andrew:
So exactly how that's generated depends on both, and you probably use a mix of both.
Andrew:
I'll try and get this image view.
Andrew:
But I mean, that's a useful thing to have data lineage.
Andrew:
But it's not necessarily part of semantic layer as such.
Andrew:
It's more useful for understanding how data is being used.
Andrew:
So again, if you're treating data as a product, you want to make a change to the data.
Andrew:
But who's using this?
Andrew:
How can I contact them in advance so that I can have a migration plan for them so I can move over to not just break it and have them fix it in the next day and spend the rest of the week fixing it and have all the reports broken or the actual applications broken.
Andrew:
So it's useful for that.
Andrew:
But when you talk about semantics, I think that's where you start thinking about the discoverability area of data, so things like data catalogues, how to discover data.
Andrew:
And then once you've found the data you want to use, how to actually use it, how to plug it into my reporting tool, how to plug it into my network, wherever it might be.
Andrew:
That's more on the semantic side.
Rachel:
I would just kind of jump in and say I think the domain-based approach really kind of helps with
Rachel:
the semantic understanding of the data because it allows you to have that very high-level data
Rachel:
model that you might not have had previously, bringing that operational and that reporting
Rachel:
data together within a larger encapsulation of the domain and having the ownership of that
Rachel:
and the metadata around it really helps people to get the visibility of what exists and what it
Rachel:
means.
Rachel:
I would also look at that contract between your operational system and the reporting data product as we kind of get into that integration space, because every data product probably has, whether directly if it's a source-aligned or indirectly if it's a consumer-aligned data product, will have that operational system underpinning it.
Rachel:
That's where the data is going to be coming from.
Rachel:
But the language you might be using around that data might be different because your audience is different, your usage of it is different, so your contract and your metadata allows you to be able to trace that you are talking about the same thing and where it is sourced from as well, so it allows us to bring that together.
Karol:
And then on top of that, the domain partitioning helps us understand the proper context of said data from which it originates, should it be a customer domain, order domain, product domain, whatever other domains we may have in the company, and that gives us even more metadata to work with for that semantic understanding.
Rachel:
Absolutely, yeah.
Karol:
Before we move further, we already have questions from the audience, so starting with Bas Bartelink.
Karol:
Hello, Bas.
Karol:
Is democratising data to everyone as well in, for example, citizen-led development?
Karol:
And that's an interesting question there because personally I'm not a fan of citizen-led development because I've seen how it wrecks IT environments, but for data that might be a little bit different.
Rachel:
Well, again, it comes down to the governance side, right?
Rachel:
So in some ways it might be slightly more scary, but it depends on your organisation, the industry, as to how far you go with that.
Rachel:
Certainly for us, the way we look at it is across the ownership structure.
Rachel:
Every data product should be owned by the business.
Rachel:
That does not necessarily mean that they are building it.
Rachel:
It will depend on what it is, what the technology is.
Rachel:
The further you get towards the consumer side and you're able to use things like Power BI, as you said, it is possible for a Power BI report to be a data product, then they should be absolutely capable of building that themselves.
Rachel:
If it's something that's a little bit more technical, that might come depending on the maturity of the data platform.
Rachel:
I would say it's the overall target and vision, but everyone's on a different path to get there and a different stage on that path.
Andrew:
Yeah, similar, I agree with that.
Andrew:
The way we did it is slightly differently, I think, or a slightly different approach.
Andrew:
We approached it in a slightly different way because we come from a different angle, really.
Andrew:
So we wanted to really look at the barriers between the production and consumption of data, so people could produce data of the right quality, that's right, governed and managed, and it could be assumed by people who could rely on it and build a point of confidence.
Andrew:
So we did a lot about the platform layer to enable people to produce data and become experts in data management by just producing it in a standard way, and then everyone is available to that company, and similarly, if you're consuming data, you can find data and consume it quite easily.
Andrew:
So in a way, that's democratisation.
Andrew:
I think it's also quite similar to being citizen-led in a way, but it's more like led by, but anyone in business can create their own data and manage their own data.
Andrew:
That's not an honest task anymore, because we've built capabilities to allow that.
Andrew:
So anyone could do that without having to go through a central data team to be the ones responsible for creating quality data, creating quality data products, managing quality data products, and becoming the bottleneck for data in business.
Andrew:
So yeah, similar idea of democratising data, but more focused on engineers building and consuming data.
Karol:
I think this is another question that is tied into this particular part of the conversation, and a question to you, Rachel, specifically from Stefan.
Karol:
Are you the adept at having one single source of truth in terms of aggregated data residing, for example, in a data lake?
Rachel:
Yep.
Rachel:
So our overall strategy is absolutely to have a single source of truth, and that's again one of the reasons that we jumped on data mesh early on, because having that domain-based approach makes it really clear where something should be living and who owns it.
Rachel:
The interesting thing becomes when you are aggregating the data together, because you have got that cross-domain kind of concept, and we've talked a lot at the moment about those source-aligned data products which are sourced directly from an operational system.
Rachel:
So looking at the domains that don't necessarily have an operational system in, but it's the aggregation of data from other data products.
Rachel:
So it's still the central source of data, it will take the master data and it will turn that into something new and consumable by whoever it is that needs it.
Rachel:
Does that make sense?
Karol:
Again, not a data architect to me.
Rachel:
The stun silence is always good.
Karol:
It makes some sense to me because, well, I'm coming from an operational world, right?
Karol:
I don't deal much with aggregation of data unless these are logs, really.
Karol:
So for me, the source of data is basically the system owning that data.
Rachel:
It's a similar kind of thing though, right?
Rachel:
Because if you're looking at it from a system point of view, an operational system, they will often be taking data from multiple sources in, even if those other sources are primarily reference data.
Rachel:
It's a similar kind of consideration.
Rachel:
You're still talking about who owns that reference data, how am I getting it to where I want it to be?
Rachel:
You still want that single source of truth that you can go to, that you can trust the data and make sure that anything you're doing operationally on top of it that's going to support those business decisions are absolutely high quality and accurate.
Rachel:
So again, fairly similar considerations across the board.
Karol:
All right.
Karol:
So just granulated into business processes or domains then?
Rachel:
Yeah, absolutely.
Karol:
Yeah.
Karol:
Okay.
Karol:
Then that's understandable to me because I've seen already situations where there was multi-mastery of data spread across two systems or even worse, two companies cooperating through some sort of integration with one another.
Karol:
The number of problems you get with inconsistencies and then trying to figure out what happened and why it happened, it's like very, so as long as we have that single source of truth, also in operational world, it's lovely.
Karol:
Otherwise it's going to be a nightmare to handle anyways.
Karol:
And not only from the perspective of the inconsistencies, but from contractual perspective as well, because it's not easy to have multi-mastery.
Karol:
It's not a solvable situation if we want to persist the multi-mastery because the solution to that is have a single master of data, obviously.
Karol:
So yeah.
Rachel:
Yeah.
Rachel:
It's a lot of the same considerations, I would say.
Karol:
Only different granularity.
Rachel:
Exactly.
Andrew:
Different audiences maybe, but yeah.
Andrew:
Okay.
Karol:
Interesting.
Karol:
Wow.
Karol:
Apparently the topic is very popular because I haven't had so many questions at the beginning of the stream so far.
Karol:
So it's like, whoa, we're moving.
Karol:
I guess, all right.
Karol:
If we're still defining data match, then this is a very valid question here coming from Sneha, I suppose.
Karol:
Sorry, I'm butchering Indian names.
Karol:
Sorry for that.
Karol:
So what kind of metrics dashboards do you use to measure success of a data mesh implementation?
Karol:
I think that's a very valid question when we're defining what data mesh is, because we're having data mesh.
Karol:
What's the success story?
Karol:
Where we say that our implementation was a success and we're moving to further augment it and better it.
Rachel:
Yeah.
Rachel:
So I don't know.
Rachel:
A lot of people, if you've worked with data before, you will be familiar with the good old data maturity assessment.
Rachel:
So from our point of view, we've kind of taken a similar approach, but used it much more from a metric space.
Rachel:
Because what we're looking to do is a lot of this is about providing greater access to people.
Rachel:
And again, it's consumer driven.
Rachel:
So making sure that we have the metrics around the data products per domain.
Rachel:
Do you have data products available?
Rachel:
How many people are accessing them?
Rachel:
How many times are people accessing them?
Rachel:
But the other side of your data product is you should be looking to proactively get feedback and evolve them to meet the ever-changing needs of the business.
Rachel:
So therefore, you want to look at how and if those data products have evolved as well.
Rachel:
So you're looking at getting that full story across the existence, usage, life cycle of the data product.
Rachel:
And then obviously the definition of success will vary from company to company.
Karol:
Would putting some measures over data contracts would be also an applicable measure for success of the data mesh in terms of the volatility of the data contracts, or the scope of the evolution of said contracts, or the utilisation of the APIs, interfaces, whatever the means are to consume those data products?
Andrew:
Yeah, I think we're all good measures in to help you understand how useful we're talking about building, how useful your role has been, not how far your migration or whatever your maturity is.
Andrew:
They're all quite useful metrics for that.
Andrew:
I was having more metrics for you to track your success with whatever you're trying to achieve there.
Andrew:
Are you trying to put more things in that contracts?
Andrew:
Are you trying to have more use to that product?
Andrew:
It can be useful for you to understand, but none of those are reasons you're doing a data mesh, really.
Andrew:
You're doing data mesh to solve some sort of business problem.
Andrew:
When it comes to the KPIs you're allowing there, that would be something that's more useful to your business.
Andrew:
So for example, the reason why we started with data contracts and then eventually started moving towards data mesh was to solve a particular business problem where we wanted to use our data, our analytics data, to build product features that we then sell to our customers, and our data wasn't reliable enough and good enough quality to do that.
Andrew:
Think of reliability, like our data pipelines are failing very often, and again, you'd argue that's okay for reporting.
Andrew:
I don't think that's right, because reporting is just valuable, or we'd have been doing it, but it certainly wasn't okay for products we want to sell to our customers for them to be breaking weekly.
Andrew:
So we focus on reliability and data contracts was one of our solutions to that.
Andrew:
Quality as well, in terms of how long it takes to actually build these products you want to sell.
Andrew:
A lot of that was slow due to quality of data, how can I find it, how can I use it.
Andrew:
So that's another good measure of our line to business value of trying to build these products, which are one of the key strategic aims of a company.
Andrew:
So yeah, best when it comes to actually choosing, but with metrics we spoke about, good for you to measure your own success in your messaging of data products, or how you're building things, or how you're managing data, is that improving, that will reduce some risk.
Andrew:
But the reason you're doing this is probably to sell, or ideally to solve some sort of business need, and if you tie your basic API on being tracking through all this.
Karol:
So in that sense, data contracts may not exactly be the best API, well things around data contracts may not exactly be the best API in measuring the success of a data mesh.
Andrew:
Yeah, I'll give you an example of a time I used as a measurement, but it wasn't a good idea, and we learned the hard way.
Andrew:
So we started with data contracts to solve this reliability problem,
Andrew:
it was quite successful, more people started adopting it, and then we started as a bottom
Andrew:
up approach, so we were solving problems locally in teams, building platform capabilities,
Andrew:
when people were adopting more different teams, and eventually people higher up started taking
Andrew:
notice of CTO, CPO, and this is a good idea, why don't we just have everything on data contracts?
Andrew:
We thought well, yeah I guess we could, that sounds exciting, this is what we've been talking about for a while, like this is our ultimate end goal, we need to have that, if you're going to back us for that, let's do that.
Andrew:
So then we have this KPL of like how much of our data was on
Andrew:
data contracts, that percentage, and we started trying to get every team to move data onto their
Andrew:
contracts, but that wasn't a great metric it turns out, because the issue we started trying
Andrew:
to track this metric per team, and I said by the end of this quarter I'd get to 10%, 20% and whatever
Andrew:
it might be, and the first quarter I'd just move all the easy things out of data contracts, data
Andrew:
that wasn't really being used very often, not particularly valuable data, you know, well you've
Andrew:
moved your data contracts, you've hit your metric, but the value of your business has been very low,
Andrew:
but you've chosen it because it's easier for you, and when it gets to the harder ones, where business
Andrew:
value is higher, also the effort is higher to move on to that, but by that time people started to think
Andrew:
about well I've only spent like a quarter doing that, I don't want to spend another quarter,
Andrew:
another quarter, there's arguments about prioritisation and effort, all kinds of arguments you always get when
Andrew:
trying to get work done across teams, but again we still weren't really hitting the most important
Andrew:
data, so the actual reason why we're doing this was only to hit the metric, it wasn't really to solve,
Andrew:
you know, we need this data to create this product, we're going to sell to our merchants, it was not really
Andrew:
that part of that data, it was just the next set of next easy ones, so we started getting loads and loads of pushback
Andrew:
because it wasn't a metric that aligned with anything business wanted to achieve, it was just an
Andrew:
artificial metric for migration that we didn't need to do overnight, we didn't need to do over a few
Andrew:
quarters, we could have done it over years and it would have been fine, as long as we started to get the most
Andrew:
important things on state contracts.
Andrew:
We could argue, well the argument was if we got on state contracts
Andrew:
we could decommission those old stuff and that would be useful to do, but the effort we were spending
Andrew:
maintaining that old stuff wasn't that high either, so it wasn't a great argument to get the whole business,
Andrew:
all the software engineers working towards moving state contracts, so that's an example of when we chose
Andrew:
a bad metric, what we ended up doing was bad idea, and going back to our more use case
Andrew:
driven, like we need this data on state contracts because it's driving key business processes
Andrew:
around like fraud detection for example, and we are losing, we are exposed, if those processes
Andrew:
fail, exposure rate goes up and we want to reduce that, so when it's aligned with key business
Andrew:
processes like that or key business metric like that, easier to get different teams aligned to maybe
Andrew:
move on state contracts, you can improve quality of data, the engineer needs to do that, but when it's just
Andrew:
to move artificial metric, that doesn't stand up to much scrutiny, and you know that's a bad metric to
Andrew:
use for anything like this.
Andrew:
Yeah, we get things wrong sometimes too.
Karol:
Yeah, there are ideas and there are ideas right, and some of them are just destined to fail, but we don't know it yet.
Andrew:
Yeah, we didn't carry it away really, like rules on that.
Andrew:
Yeah, exactly, it's easily done, but a lesson learned and hopefully useful lesson for everyone listening to try to avoid.
Karol:
Good point.
Karol:
Now about lessons learned, as I hinted already, the idea to talk about the data match on this stream came from the workshop I ran at BBB Europe this year, and I received feedback to the workshop that given the sister conference being data mesh live, it was a missed opportunity to talk about batch processing ETLs, ELT during my workshop.
Karol:
Now my workshop was about ecosystem architectural styles and utilising
Karol:
domain driven design to design integrations, and one of the architectural styles, which is
Karol:
a broker architecture, is basically the most common in ETL, ELT batch processing,
Karol:
because it's a single abstraction layer that just moves data from point to point, usually triggered
Karol:
by cron scheduler or triggered manually as a one-time go, right?
Karol:
And I looked at that comment several times trying to put myself in the shoes of the person who wrote that comment, and I was like, where is the disconnect?
Karol:
Where is the problem?
Karol:
What happened here?
Karol:
Because from my perspective, moving data from place to place is a large part of the job in terms of doing data architecture, and with that having that data lineage, with that having that data traceability, and in general productizing the data and having that data matched.
Karol:
And this is why the title of this tree was versus, because from that perspective, it seemed like, from this perspective of that feedback, it seemed like we're like completely different worlds.
Karol:
Now I would postulate, for the sake of this discussion, that there is an inherent link between data architecture and integration architecture, and basically one cannot go without the other.
Karol:
Now Rachel is already popping her head.
Rachel:
Oh yes, well if you're talking about moving data, right, that's the whole point of your integration architecture, you need data architecture in place.
Rachel:
And what data mesh really brings is that decentralisation, you're not bringing everything into one warehouse anymore, you're looking at bringing from your operational system into a data product.
Rachel:
You can think relatively naively about that data product being a data system in its own right.
Rachel:
So you've got the same kind of integration challenges, but what you now have is both the integration between the operational system and the data product, from the data products to another data product.
Rachel:
There's a number of different sources rather than just that simple, do you know what, we're going from an operational system to a data product point of view.
Rachel:
And depending on the technologies you use, again, if you're working in quite a federated organisation, they might have different technologies that they're using from one product to the other, because that was one of the first points, it's technology agnostic.
Rachel:
You still have the same integration challenges as you would do in your profession, in your day-to-day life, Karol.
Rachel:
I'm sorry.
Karol:
I would argue that if they have different technologies underneath those different products, I would actually have even more problems and work challenges to handle there than with a singular technology stack or approach.
Karol:
I would jokingly argue that I can do application integration without a data architect.
Rachel:
You can't, you just can't do it well.
Karol:
Exactly.
Karol:
That's my point exactly.
Karol:
I wouldn't be doing it well, because that would be just pure and utter chaos of basically operational nightmare, where we're moving things, they're later inconsistent, incoherent, they're not usable, the quality is poor, and we're basically stuck in a limbo of technological depth.
Rachel:
And I'm sure that a lot of people are living in that world, because I know that we have an element of it in our own organisation, just to make anyone feel better if they're feeling criticised at this moment.
Karol:
Oh yes, definitely, there's a lot of people in that
Karol:
world, they just haven't gotten to that realisation yet that they do need an integration architect,
Karol:
and they do need a data architect to cooperate with each other and clear up that mess,
Karol:
incohesion, to understand where is data, what is that data, where is it needed,
Karol:
when is it needed, and then with the integration architect, how do I get it from A to B?
Rachel:
Absolutely, can I also just like point out one other challenge that we like to bring in there, the data world, in terms of using the word integration with data?
Karol:
I think I know the challenge,
Rachel:
so we have different ways that we refer to it, if you go strictly data integration is about
Rachel:
the joining of multiple sources together to provide a unified view, not necessarily the
Rachel:
actual movement of the data, which is an interesting challenge, so when you start
Rachel:
trying to have a conversation with somebody, particularly maybe a professional consultant,
Rachel:
you can get really caught up in that language barrier, even when you're two people that are
Rachel:
talking about data, I don't know if you've ever experienced that Andrew?
Andrew:
Yeah, no, I think that's
Andrew:
certainly right, like yeah, different again, different language, different people, and you've got to be
Andrew:
careful what language you're using when you're talking to different audiences and different
Andrew:
people, what they expect, but like you were saying, it's very much an integration problem
Andrew:
we're talking about here with data mesh and with application integration, and like when I was
Andrew:
talking earlier about the reason why we started adopting data contracts and started moving towards
Andrew:
data mesh, it's because of a lot of the data, that was really about the integration between source
Andrew:
systems and our analytics stack, it really was an integration problem that was sort of there.
Rachel:
Yeah, but it's quite fun when you start going, you need to have an integration architect and data people will be like, but I've only got one source of data, what are you talking about?
Rachel:
So it's where it also ties into the domain driven approach and making sure you have that common language across the operational and data side, the good old ubiquitous language.
Karol:
You know what, this is actually a fun thing, this is one of the first articles I wrote at Bridging the Gap, here's the QR for the article if you will, it's called data integration versus application integration, which basically was a bit of a statement piece on language.
Karol:
I've been in organisations where the issue was that people dumped down the terms to the word integration, and nobody ever knew which type of integration you're talking about, are you talking about integrating data to create something cohesive like a golden record or something that has substantial semantic meaning, or are you talking about moving data from point A to point B and then point C?
Karol:
Nobody knew.
Karol:
We're at fault, quoting Andrew Harmel-Law with his talk at BBD Europe this year, with variability being the second hardest problem in software architecture, and then by the end of the talk, somebody actually asking him what's the first hardest problem, and he's pointing to the whole audience, you all are!
Karol:
Yes, we all are the problem in terms of this misinterpretation of names.
Karol:
And that's not only because we talk different languages, it's not only because we come from different fields, it's also supported by the technology we use, because if you use integration platforms, integration platforms do have data integration capabilities.
Karol:
Should we implement data integration processes within an integration platform?
Karol:
Probably not a great idea, but they haven't.
Karol:
Now, if you go to different data technologies, MDM or so, do they have application integration capabilities?
Karol:
Well, yes, yes, they have, because you need to get that data somehow into them, so the creators of those technologies figured, okay, we're going to add slap-on to that some application integration capabilities so we can actually get that data ingested.
Karol:
Yeah, okay, but these are just those small capabilities, they're not tools for the job.
Karol:
So, to give you an example, I actually seen a team that was supposed to do application integration composed of former DBAs and database developers that built an integration platform using an MDM solution over a NoSQL database.
Karol:
In application integration, we should not persist anything that persists temporarily while they build an integration platform that was permanently persisting everything.
Karol:
Talk about the misconception about understanding terms, right?
Karol:
So, instead of doing application integration, they actually did data integration and they called it that, oh, this is our application integration platform.
Karol:
I was like, where did we go wrong here and when that's a crucial problem?
Rachel:
This happens everywhere, right?
Rachel:
And this is one of the things around data mesh, because the tooling always jumps on the language of whoever comes up with it first and puts their own interpretation on it.
Rachel:
So, it's shaped a number of interesting talks and conversations around different organisations about, you're doing what?
Rachel:
Oh, yeah, well, that's the way the tooling drove us to do it.
Rachel:
And it said that it supported data mesh, whereas the messaging that's kind of come from Jamaat through her ThoughtWorks days has always been the tooling isn't actually there to support it and that's why she's gone in and built it.
Rachel:
This is not an advert for her tool.
Rachel:
But it's more kind of around that adoption of it.
Rachel:
And there was a wonderful, wonderful piece, I'm trying to remember who wrote it the other day, that was very much about the architect is the person that's read the whole article.
Rachel:
So, it's not the person who's jumped on that language, but it actually understands what they're talking about and can cut through the nonsense.
Rachel:
So, it's really important that we do that and just don't take a word or phrase it for face value.
Karol:
I'm watching over LinkedIn and several other media, several companies that produce application technologies, products to move data.
Karol:
And at times, I'm looking at them and it's like, their marketing is a pure nightmare of just conjoined little monstrosities of terms that make no sense from a perspective of software architecture, enterprise architecture, or anything else.
Rachel:
On the surface, they sound good, right?
Karol:
They sound good.
Karol:
They sound great.
Karol:
I mean, looking at one of the companies with, I'll not name for the sake of it, last week they coined a term, composable architecture.
Karol:
That's a very nice term.
Karol:
It sounds great.
Karol:
But it was just marketing fluff because their product architecture was supporting composability as an architectural characteristic for years.
Karol:
It was there already, the composability, the reusability coming with that composability.
Karol:
It was existent.
Karol:
But they figured, oh, we're going to change our licencing model and we're going to market this as, oh, we're now doing composable architecture.
Karol:
They literally got themselves shot in the foot.
Karol:
They dropped in Gartner rankings because of that over the next half year.
Karol:
It was quite a show just because of that lingo from marketing and trying to coin terms that have no sense in terms of the current situation.
Karol:
Then their licencing practise actually made it so that the product had to be utilised in a different way to keep it still feasible in terms of budget, limiting that composability aspect further.
Karol:
So that was like, we're doing composable architecture, but you're pricing us so that we have to drop down that characteristic.
Karol:
That's interesting.
Karol:
Again, language.
Rachel:
A lot of people won't realise until they sign the contract.
Karol:
Exactly, because they caught on to the marketing spiel and, sorry, it's not exactly the truth.
Karol:
But I think that kind of maybe ties into the Dunning-Kruger effect of things.
Karol:
We're very confident about things when we don't know a lot about them and then we're very careful when we actually start to gain a proper understanding of terms and what we're actually doing instead of just going blindly in the fog.
Rachel:
Absolutely.
Rachel:
That probably kind of leads on to one of the adoption challenges around this as well.
Rachel:
Because one of the interesting conversations that was happening at data mesh, I know both myself and Andrew kind of got involved in it a little bit, was around whether you drive data mesh from the top down or the bottom up and how you can best make that happen.
Rachel:
Do you need to have the C-suite buy-in or can you implement it by stealth by working up and attacking that middle management who are really going to be the domain owners and data products owners to be able to drive it?
Karol:
Let me guess, the only true answer to that question would be it depends.
Rachel:
Absolutely, we're architects.
Rachel:
That's the only answer to any question.
Karol:
I'm sure I'm trained in classical software architecture.
Karol:
All right, tell us.
Rachel:
So from our side, the adoption and the drive was very much from the bottom up, I would say, and it was the noise that it generated in terms of the value that was being perceived that then flagged it up for senior sponsors to be able to come in and drive it onwards.
Rachel:
With that senior sponsorship, it can happen a lot quicker, obviously, but it's not a barrier for you to be able to adopt it and drive it successfully from the bottom up.
Rachel:
It's certainly my experience.
Rachel:
So either is possible.
Karol:
That I think would be best to have both ways, right?
Karol:
Meet in the middle, to grind that middle manager from both sides.
Rachel:
A little bit.
Rachel:
I mean, obviously, one of the problems you have when you have a senior sponsor saying, do this, then people feel like they're having it done to them.
Rachel:
So I think that's really the advantage that we had, that we were able to introduce them to a concept and get them engaged and say, actually, I want this and I want to help drive it.
Rachel:
And then the message coming saying this is really important, we're going to do it, allowed it to really kind of flourish because you already had the buy in by that point.
Rachel:
You just had the freedom and people getting the space to put the time that was needed on it.
Rachel:
So I'd almost say rather than meeting directly in the middle, it's like build it and then come.
Rachel:
Because that engagement is absolutely important because ownership is essential throughout this and the business being able to drive it.
Karol:
So we first create that interest, that engagement, accountability, ownership, and then put on top of that the priority from the higher ups.
Karol:
And that makes the dream work.
Rachel:
That's certainly my experience.
Rachel:
I don't know if yours was different, Andrew.
Andrew:
No, very similar, much the same really.
Andrew:
We start bottom up, solving problems locally, aligned to business goals.
Andrew:
And our problem is really we're trying to, I can't stop saying that, kind of bridge a gap between software engineering and the data teams.
Andrew:
And the main initial drive was around data reliability.
Andrew:
And what we started doing is we started creating incidents when data pipelines failed, and obviously were caused by a change in the software engineering systems.
Andrew:
And then we'll get everyone together in a room and start talking about it, really trying to get everyone speaking about problems, learning the same language, learning what the other side were doing and what they knew and what they didn't know.
Andrew:
And getting software engineers involved in trying to work out a solution for this.
Andrew:
Like if I make a change to the database, and that's what I should be doing, but I didn't know it impacted everything downstream.
Andrew:
So we started explaining to them why that was impacting everything downstream.
Andrew:
Started getting them involved in building design solutions.
Andrew:
We were still doing it, we kind of knew where we wanted to go, but really giving them that sense of ownership of the problem as well as the solution.
Andrew:
And then do some small proof of concept to prove it out.
Andrew:
And then you can demonstrate something that works before you go and try and get that accepted buy-in.
Andrew:
And not only that, yeah, not only have you now got some proof, you've also got people already, all the interested, already engaged, already kind of being the voices in the room when you're not there, talking to their peers in maybe software engineering part of business, or web part of business, trying to convince.
Andrew:
And they feel like they've been part of the discovery of this problem and solution, not something to be thrust upon them saying, you are now creating data products, and you are now using this platform, but you had no say in.
Andrew:
Because although that does work, it can work, if you've got enough pressure from above, you have to do it, you don't get a choice.
Andrew:
But similar to that thing I spoke about earlier, about when we chose the metric, the pressure from above was there to do it, but the people doing it are just doing bare minimum to get it done, to get people put back, so they weren't really brought into it.
Andrew:
I didn't really know why they were doing it, I was just trying to tick the box.
Andrew:
Whereas if you can try and get people working more closely together at all levels, including bottom levels, or lower levels, I won't say bottom, lower levels, where the actual work's being done, then you're going to get a much better result from that, because everyone's involved, everyone's engaged.
Rachel:
Gregor Hopper's Architect Elevator, right?
Andrew:
Yeah, I've been re-reading that, actually.
Andrew:
So, actually, the quote you mentioned earlier about the architect is the person who reads the whole article back from his book.
Andrew:
So, yeah, it's, if anyone's not familiar with him, it's well worth reading all his books, obviously, it's great to talk about this kind of thing.
Karol:
And that kind of top-down pressure, without that accountability, I think that's usually leading to quite a damage in quality.
Andrew:
Yeah, and people, it's documented everywhere, about having these top-down, big IT projects, they always fail, or regularly fail.
Andrew:
They're always more expensive than they're going to be.
Andrew:
You know, you go away for two or three years, deliver something, within that time you deliver nothing, and then your project gets cut, and you just waste loads of money and time and effort.
Andrew:
So, I'm not saying every top-down project is done like that, like after two or three years, like that timescale, with no deliveries, but many are, and many fail because of that.
Andrew:
Whereas if you start small, solve locally, solve initial problems, deliver value, then incrementally build up things up towards the data mesh, maybe that's your ultimate goal, but maybe that's not what you're talking about initially.
Andrew:
Like, for example, where I previously worked, I rarely spoke about data mesh, I rarely used the term, although I knew that's where it was going, because most people in business didn't know what it is, didn't care what it was, and if they did know what it is, it was even worse, because they were scared of being made to adopt this big, scary idea, in their view.
Andrew:
So, we just didn't talk about it, we were saying, okay, well, the next part of our approach is to start treating data a bit more like products, and the next part of our approach is to move away from tightly cupping ourselves to the debt basis and move to more events and apply debt contracts, and just gradually, like, phases towards, you know, self-ownership and self-governance.
Andrew:
We are essentially doing data mesh, but we don't tell people about it, we don't care about that, but we're doing it, solving problems that we have today, but with a long-term view in mind.
Rachel:
We don't use the term data mesh with anyone outside the actual data function, because there's no need to.
Rachel:
The business just see the value that they're getting from it.
Rachel:
I think part of the problem, what you're saying in terms of big IT projects failing, is because we don't run them like business transformations, we don't give that consideration to the business readiness, we just try and jump straight to adoption.
Andrew:
And technology factors as well, usually.
Karol:
This strikes me from a psychology perspective.
Karol:
Now, I'm not a psychologist, but I double in my spare time in the concept.
Karol:
It's very heavily about, first of all, what you mentioned was basically dismantling a very, very huge, scary thing called data mesh, without even naming it data mesh, or naming it that huge thing, into small pieces and making them actually actionable.
Karol:
So there's somebody with a vision, not telling about the grandeur of the vision, but just issuing need-to-know basis, right?
Karol:
Small, contained, actionable pieces of, let's do this, go this direction, and see.
Karol:
And then this approach, if we look at it from psychology perspective, changes for most people are very scary.
Karol:
Changes very much hit into their sense of psychological safety.
Karol:
If people's psychological safety is compromised, they will not be performing.
Karol:
They will not be delivering quality, and they will not be taking accountability and ownership.
Karol:
So if we put it in a way that we put it in terms that are containable, and our human brain can wrap around them, and simplify this to some sort of a model understanding that is sufficient, and doesn't leave a lot of questions, then we cater to that psychological safety of those people, so that we then drive that adoption further.
Rachel:
Absolutely.
Rachel:
So again, it's kind of going back to Gregor Hopper's architect elevator, and looking at every level, and explaining it in terms of the business problem it solves.
Rachel:
Because whilst people are resistant to change, they're not resistant to progress.
Rachel:
So I have the wonderful joy in my job of talking to an awful lot of people about the domain architecture, the part the data plays in it, etc.
Rachel:
And I've, within the organisation, been talking to partners, to interns, and it's just framing that in a different way.
Rachel:
So if a partner comes down to talk to me, then the conversation will be very much about strategic prioritisation and oversight, because that's what they want, that they can't see at the moment what we have.
Rachel:
If one of the heads of department, the domain owners come along, then we talk about ownership, empowerment to drive their area forwards.
Rachel:
Their parts play a really strong part in that culture of the organisation, and driving the change themselves in the way that they see it making progress.
Rachel:
And then if you get someone more on the operational implementation side, we're talking more about the value that they're adding to the business, and the efficiency that is coming from making sure we have this single place to build things, building them once, building it consistently, and having the building blocks to do that efficiently, and the value that brings the business.
Rachel:
So it's a huge change, but it's being delivered in a very different way to make sure that people engage and go, this is what we need.
Rachel:
The number of times I've been told that over the last year has been absolutely phenomenal.
Karol:
Yeah, and here Stefan is popping in comments in terms of the resistance to the idea, resistance to change.
Karol:
Especially, we already have failed projects in the past, and the resistance to change.
Karol:
Again, not progress, but the change is going to be there, and that results in less of an adoption.
Karol:
Apparently, I didn't know the term, it's called the change fatigue, which I would understand, because having all those changes, and especially if they're not successful, one thing is damaging that psychological safety.
Karol:
So what's going to happen to me?
Karol:
Where am I going to be?
Karol:
Am I going to be replaced or not?
Karol:
Am I going to be tired?
Karol:
But then the ingestion of information, especially if we were presented with some grand design, a big one, it lands straight in the problem of having extraneous cognitive load, which we need to address, but maybe we should just limit it instead.
Karol:
Then, again, going back to those actionable, stackable elements, we're literally lowering that extraneous cognitive load that way, which is something very interesting.
Karol:
I'm connecting the dots right now, because just two weeks ago, there was the stream about cognitive load, so I'm now a little bit smarter on the topic, and I did a great job explaining the different qualities of cognitive load on the stream.
Karol:
If you haven't seen that, then just go into YouTube and look at the recording.
Karol:
It's a really, really interesting talk, and Radek was drawing around in his tool, showing the concept of what that is, but that ties into that adoption aspect here very, very nicely, and that's an interesting just to connect those dots.
Rachel:
I think one of the great things about data mesh in some ways is that if you look at it as one thing, it's a massive change.
Rachel:
It is or can be very much a series of smaller changes, but what ...
Rachel:
That can obviously lead to change if it's done the wrong way, because it's the definition of like, oh my God, what do you want me to change now, or what's happening now?
Rachel:
I've just learned this thing.
Rachel:
Actually, if you're looking to drive it in the right way, you give people enough information that they go, yeah, this is what I want, but in order to do this, I need this.
Rachel:
They drive the ideas and direction of the next change, or at least you use the sessions you're having to influence them to think that they are driving the direction of the change.
Rachel:
They constantly see it as that step of ...
Rachel:
Steps of progression rather than these steps of change.
Rachel:
That's obviously not going to work across the board.
Rachel:
There's always going to be some people who fall by the wayside, but for the majority of the organisation, it's working so far.
Rachel:
Okay.
Karol:
I mean, probably, with people like me, that wouldn't work.
Karol:
I have a constant problem with hierarchy, and going, being conformist, so that would be interesting to see that in action, but absolutely, that's ...
Karol:
It's kind of a bit like an illusion, but it does support progression.
Rachel:
Absolutely.
Rachel:
It's one of the hardest things.
Rachel:
We neglect the people side.
Rachel:
We were having a bit of a joke earlier on, as we built out our enterprise architecture team, starting from two people to be these different disciplines.
Rachel:
We're going, we're missing the people architect.
Rachel:
Well, it has a role, but it's fundamentally kind of building in those social skills, the empathy, the individualisation, and making sure that we can work across the levels to bring everybody on the journey, and not just sound like we're preaching from an ivory tower.
Karol:
I think that's a very underrated skill overall for all architects, is the empathy and the people skill, to be able to connect with them, understand them, and convince them that something might be actually good for them.
Karol:
We tend to look at architecture, well, I don't do that any more, but the general notion is that this is a technical field.
Karol:
It's very much a technical field as much as a people field, and depending on where you're positioned in your organisation, if you're a technical architect, a solutions architect, an enterprise architect, integration architect, data architect, the proportion of people skill to technical skill, it differs at times, but it differs ever so slightly.
Karol:
It's not that large of a shift, in my opinion.
Karol:
It's more of like, you need to be more of an SME here because this is a very technical area, but you're still going to be working with people.
Karol:
I still have in my memory what people were saying back when I was still a junior, that they didn't go into computer science to study computer science and work in IT to work with people, and I think that's such a false premise to go into this field and work IT if you're not going to be working with people, because, in the end, we're here to support other people and drive their businesses in a way.
Rachel:
Yes, absolutely.
Rachel:
It's a service function, so you need to be able to understand the people that you're working with.
Rachel:
Going back to probably exactly the same sources you've already mentioned a couple of times, but there is a wonderful quote about the architect isn't the smartest person in the room.
Rachel:
They're the person that's better to pull the information out of the person in the room, and I think that's something that's certainly helped me, because I'm never the smartest person in any room, and I'm totally all right with that.
Karol:
I tend to differentiate that in terms of being smart and being wise, because, from my understanding of the words, knowledge and wisdom are two different things.
Karol:
I might be very knowledgeable, but I might not be the wisest, and there's always, in any business, especially that I work as a consultant, not in-house at a company, in every business, there definitely will be somebody way wiser than me in the domain space or problem space that I'm trying to understand and solve.
Karol:
And being able to go into that and weave out the knowledge and understanding from those persons, that's a very, very, very crucial skill for an architect.
Karol:
But people learn it the hard way, really.
Karol:
Then comes this question here in terms of exactly pulling out knowledge out of people.
Karol:
Should I identify my data consumers first when deciding whether to update the mesh architecture, or what should I do in terms of that?
Karol:
Because this is, again, a problem domain, I suppose, and understanding your people.
Rachel:
It's more about understanding your business and what the problem is that you're trying to solve, or you're expecting the data mesh architecture to solve.
Rachel:
It was actually quite, for us, I've already mentioned slightly earlier on about bringing the operational reporting together, because there was a big disconnect before.
Rachel:
One of the really interesting comments I thought in one of the training sessions I did with Jamak right back at the start when she wasn't so busy and was actually doing them in person, one of the comments that she made, and I actually disagreed with, was that data mesh wasn't really for smaller organisations.
Rachel:
Now, I would call ourselves a smaller organisation.
Rachel:
We're a midsize investment management firm, but we're primarily located in one location, which actually makes data mesh a really good implementation for us, because we don't have the challenges of kind of cross-continent, cross-border considerations.
Rachel:
Everybody's in the same time zone.
Rachel:
Everybody knows each other.
Rachel:
Everybody knows what their business is about and can understand and gives you a nice in to that collaboration with fewer challenges.
Rachel:
So, there are a number of things to consider before adopting it, but primarily, what's the problem you're trying to solve?
Rachel:
It's probably not identifying your data consumers first.
Rachel:
They will be different for different products.
Rachel:
So, you'd probably get down into quite an interesting rabbit hole if you started to do that.
Karol:
That comes as well with just products, not only data products, but software as a product.
Karol:
I remember a situation back when I was working at T-Mobile, T-Mobile Poland is owned by Deutsche Telekom from Germany.
Karol:
At that time, Deutsche Telekom had a lot of say in terms of what the group does, and they figured they're going to introduce a one centralised mobile app to manage your account, your phone subscription, and they figured, okay, we're going to put that in.
Karol:
We're going to put postpaid, prepaid, perfect, make some country-specific views, and that's going to be just fine.
Karol:
And they came to Poland and introduced the concept.
Karol:
They started workshopping it, going into that, and then they realised that, well, it's not going to solve any problems in Poland because we not only have prepaid and postpaid, we have a mix, which was a prepaid, postpaid.
Karol:
So, basically, you pay the fixed price monthly, like five pounds, equivalent of five pounds, and you could charge your account like you did a prepaid afterwards to get more minutes, to get more SMS, and then the whole idea went out of the door because it just didn't fit, because they didn't understand the problem they're solving.
Karol:
They tried to globalise the problem, but it turns out that certain territories have a different problem to solve, and I think that translates exactly to the productisation of data or productisation later to APIs delivering data products that you need to understand the problem you're solving in that sense, right?
Rachel:
Absolutely.
Rachel:
It's another example of wherever you're working across the data pipeline, whether it's in the functionality, whether it's providing the data, it's a lot of the same capabilities and considerations.
Rachel:
Yeah, integration, observability, all of it, it all comes together.
Karol:
Now, you said the keyword here, observability, I'm looking at my lovely list of topics and questions.
Karol:
Should we jump into observability, and here already tying into the question in chat from Stefan about data privacy and data governance, because these are inseparable.
Karol:
So, data lineage, observability, data privacy, data governance, how do we get to address that with data mesh, especially with sensitive data and then also sensitive geolocations that were spread around the world, like let's say China, Russia, these are now a little bit of burning topics in terms of sensitivity.
Rachel:
I'm going to just jump on that question directly from Stefan, because it's absolutely a consideration that we have.
Rachel:
The way that we are looking to do that is to set up the data product as appropriate for the consumer.
Rachel:
It comes back to that side of things.
Rachel:
So, within the same domain, you can provide the same data or filtered by different data products.
Rachel:
Each of the data products not only has the data contracts as to the data moving into or out of it, but the product itself has the data sharing agreement around it.
Rachel:
And what that stipulates is who can use the data for what.
Rachel:
So, you can have one data product that is set up saying this data is appropriate for clients across UK and Europe, this data product is appropriate for use in China, this data product is appropriate for use in New York.
Rachel:
And you can set them up to be based off the same sources, so you have the consistency of data, but providing only the data that is appropriate for the consumer or the consumer group.
Rachel:
Does that make sense?
Karol:
So, we're basically going back to data contracts and metadata with those contracts to align what contract is for whom in that sense.
Rachel:
So, yeah, in terms of the product you're building, fitting the consumer requirement and having the metadata and the security structure around that.
Rachel:
So, it depends on what technology you're using and how you're going to actually implement that.
Rachel:
But for us, for example, you can have a separate database for each data product and say, do you know what, we are now able to be confident that we are able to provide this data to the right audience and we are not going to be hit with any kind of regulatory mishaps.
Karol:
Right, because the data privacy aspect, especially, that's a bit of a nightmare to govern.
Karol:
I've seen one of the learning things in terms of application integration, learning moments that I had over the years was looking at the anaemic events pattern.
Karol:
For those unfamiliar, anaemic events are basically an event-driven architecture of events that are emitted from a source system but they don't contain business data, they contain only an identifier of a record that has changed.
Karol:
Sometimes with large datasets, this is accompanied also by a schema of which part of the dataset has changed.
Karol:
So, if we have, let's say, a record that has 200 different attributes, these kind of things happen, unfortunately, then the schema says which part of that record has changed.
Karol:
So, it's grouped into specific schemas and they compose one record.
Karol:
Then, basically, we don't know what data was there, we don't know what data has changed from what to what, we need to pull it.
Karol:
I always considered anaemic events before encountering them in the wild, I considered them kind of an anti-pattern, over-complicating things.
Karol:
But then I saw them utilised in a proper way, in a situation where exactly that governance aspect was needed.
Karol:
And that tied down, basically, to the API contracts, exposing that data product.
Karol:
In that terms, that was specifically employee data, employee record.
Karol:
And that was a very interesting thing because the record was emitted as a, the consumer that needed that knew that this was the event, and then I need to go to the specific API for a specific schema to pull the changes.
Karol:
Now, the absolutely amazing part was that for each API,
Karol:
each of those consumers had a specific contractual obligation with metadata, with security,
Karol:
recognising that this was that consumer, which basically meant that they would, each time,
Karol:
each of them, for that one record, they would get basically a different data set,
Karol:
governed by the specific contractual rules implemented into that API,
Karol:
about the delivery about that specific product from that MDM solution, centralised MDM solution.
Karol:
It's a hustle.
Karol:
But if we're considering working with sensitive data, where security is our driving characteristic, not performance, not availability, not speed, not time to market, then that was the only sensible implementation to do, to divide it that way and set it up centralised to be able to govern this kind of thing.
Karol:
And this is not about data architecture per se, this is about the data movement, so application integration, and the specific architectural pattern to facilitate that requirement.
Karol:
The security part was in terms of the data architecture that were governing that.
Rachel:
Well, and the security architecture, we need the security architect as well.
Rachel:
But that probably starts going into to answering Stefan's kind of follow-up question that he's asked there around the, if there are geolocation restrictions on the way data can be shared.
Rachel:
Your design sets up the data product and the rules around it, and then it, again, it's kind of down to you, the tech stack that you use, how mature you are in that space as to how you are going to implement that.
Rachel:
Karol's just kind of given one example that can be used for that, so there are definitely ways that it can be implemented.
Rachel:
But the key part is understanding those restrictions and designing for them.
Karol:
Andrew, I think this is where you can shed some light, because the way I understand contracts, and I'm usually using the term integration contract, but that doesn't really differ from a data contract per se.
Karol:
My understanding of a contract in IT in terms of moving data and using data is that it begins with a conversation about the business process, and then we go down to implementation after a long discussion along the way about architecture, patterns, etc.
Karol:
And looking at that, for example, with the context of Stefan's question, how would you tackle that from a data contract perspective?
Andrew:
Yeah, we've tackled similar things like this with data contracts.
Andrew:
So, say a data contract starts off with capturing what's been agreed through conversations and co-define that in a human machine-readable format.
Andrew:
And once it's in a machine-readable format, and that, again, doesn't matter what it is, you can then start using that to automate and build various tooling around how to manage this data.
Andrew:
So, for example, in a data contract, we specify a schema, and within that schema, we specify, we categorise our data.
Andrew:
Because I work for a payments company, so financial data is really important, we need to be careful with that.
Andrew:
And we categorise, you know, is it about payment, is it about customer, is it sensitive, or confidential, or public data, all those kind of things.
Andrew:
And that's all you have to do as a data owner, is just categorise your data, which you should be able to do, is your data.
Andrew:
And the tooling takes care of making sure the right people in the right roles have the right access to the right data.
Andrew:
It's all kind of automated for you.
Andrew:
And that's just one example.
Andrew:
We do loads of things with tooling, backups, various other governance tasks, data retention policies, all that sort of stuff is driven by the platform.
Andrew:
So, I guess, I think what Rachel was saying, it kind of depends on how mature your technology stack is, in terms of how you might want to implement something, like what Stephanie was talking about.
Andrew:
But there's no, you know, given enough maturity, enough time, money, effort, you can build something that's very automated, that satisfies these kind of requirements.
Andrew:
So, you're not having to have every individual data owner, data producer, decide how to produce data that's worked for different geos, or different people in different roles, or how it moves across different geos, or data moving across different parts of the world.
Andrew:
We weren't going quite far.
Andrew:
We were, at some point, looking at how we might make data available in different countries, and data locality, and make those regulations there.
Andrew:
And again, we couldn't see any reason why we couldn't automate that through the platform.
Andrew:
So, we were just saying, well, these particular records relate to the EU, these particular records relate to the US, and they shouldn't be transferred across, between back and forth.
Andrew:
And access to them should be only available through very similar tooling that's based in the right country, or right geo.
Andrew:
So, yeah, there's no reason why you can't automate this, once you describe the data in enough detail that your tool can work with data.
Andrew:
So, again, that's the importance of metadata, again.
Andrew:
A contract just captures that metadata, makes it available to humans to fill in and to manage it, but also to machines to automate, to manage that data based on a contract.
Karol:
So, let's shoot some examples what metadata would look like, because from my perspective, when I'm moving data from point A to B, I'm looking at some standardised HTTP headers, some IP of origin, this kind of thing.
Karol:
But on top of that, a must would be a timestamp, a correlation ID, message ID.
Karol:
If formatted as an UL ID, then, well, I can drop the timestamp, because the timestamp is baked into it.
Karol:
But these are like the base of all metadata that is a must for me as an integrator.
Karol:
But looking at the data from the perspective of a, I guess, business process or business in general, I believe that there would be a lot more to it than just these operational metadata.
Andrew:
Yeah, exactly.
Andrew:
I mean, really, you can put whatever you want in that contract, helps you manage data.
Andrew:
And I would say the first thing to do is try and make sure you put in things in there that's going to be useful.
Andrew:
That's the reason why someone's completing that metadata, filling it out, managing that metadata, keeping up to date.
Andrew:
If you're not going to use it, it just becomes a waste of task.
Andrew:
And, again, we talked earlier about trying to convince people to use the systems and get their buy-in, but you need to understand why it's important to categorise data, for example.
Andrew:
It's important because we're going to use it for data retention policies, for access management, for various other things.
Andrew:
So it's clear use of that data.
Andrew:
But really, I think the minimum you want to date contract is probably the owner, a version number, data evolves, and a schema, so you know what structure the data is.
Andrew:
That's probably the minimum.
Andrew:
But you can put various other things, SLOs, you can put data cache correlation, like I talked about.
Andrew:
You can put retention policies, backup policies.
Andrew:
You can put a number of things, really.
Andrew:
And if you want to find out the kind of things that people are putting in date contracts, there's a date contract standard, a date contract standard, under the Leeds Foundation, called BODCs, a date contract standard.
Andrew:
That's a list of the kind of things people are putting in date contracts in practise.
Andrew:
But the main things we put in there were data cache correlation, ownership, and these are all useful things, right?
Andrew:
For example, ownership, if there was an issue of data, was it matching schema, was it matching data quality rules that were in there?
Andrew:
Use the ownership to route the alerts back to the data owner, so they can then go and investigate the data and fix the data.
Andrew:
Version number, we put that in there to implement change management around the data.
Andrew:
I mentioned earlier that one of the issues was, the main issue we were trying to solve initially was the reliability of data.
Andrew:
That's because data had been changed regularly, but there was no change management part of it.
Andrew:
So, we started, again, similar to what you do with API, you have an API version, making minor changes is fine, making breaking changes, you wouldn't deploy that, you would have a migration path from one or the other.
Andrew:
So, I date contract a version number in, every time I made a change to a contract, some checks, and if it's a non-breaking change, fine, go ahead and deploy it, no problems there.
Andrew:
Breaking change, you can't do that, follow the change management process, talk to consumers, find out where they are, work out how you're going to migrate everyone over to it.
Andrew:
We didn't exactly say how we're going to migrate, we said you are data owners, you have autonomy to decide how to migrate your customers over to the next version of the product.
Andrew:
So, autonomy was quite a strong theme as well.
Andrew:
So, yeah, anything you want in a date contract, make sure it's useful for something.
Andrew:
So, with a problem that you have with data management, with change management, with access management, whatever the problem you're having, make sure you capture that in a date contract and use that information to ideally automate or at least support data owners in managing their data.
Rachel:
The encapsulation of both your functional and your non-functionals, it gives you the opportunity to think of that, you're not so happy path as well.
Rachel:
So, there'll be expectations around data, certainly, you know, I need this data by 7am, otherwise I won't be able to do whatever I need for that.
Rachel:
What do you want to happen if it's not available?
Rachel:
So, making sure that we do that.
Rachel:
So, by having that standard format, it means that that template is there for everybody who comes to create a data contract.
Rachel:
They don't have to reinvent the wheel, they don't have to ask the think what questions they need to ask around it.
Rachel:
So, it becomes a really, really useful tool around the organisation for both the business and the technical implementation team.
Karol:
The question that came to my mind while you were explaining the various metadata for a data contract, probably half of it does not bear that much impact on my work as an integrator.
Karol:
They bear more impact on the overall end-to-end solution, but purely looking at data movement, it's like, that doesn't really resonate.
Karol:
So, if we would be looking at the data contract, at the metadata around the contract, how much do you think actually translates to the instantiated object of a data being moved from point A to point B?
Karol:
Because I think there's a different scope, a smaller scope, definitely, of those metadata being attached to that specific instance and moved alongside that instance.
Andrew:
Yeah, I think that's a really good point, actually, because the examples I gave were a lot about how you manage data when it's stored at rest, which you're not necessarily doing when you're integrating applications or you're stateless.
Andrew:
But with data integration, I don't know how you want to use that term, but when you're integrating data, at some point, you're trying to...
Andrew:
When you're working with data in a stateful environment...
Andrew:
You need to manage data correctly.
Andrew:
But when you're moving data around, within each particular instance, there are things in there that you also want to capture.
Andrew:
So, things like SLOs, like how is this data, for example, is expected to be present by 7am, and what percentage of them worked, I think, by that time.
Andrew:
That's the SLO that you report on.
Andrew:
Things like SLOs would be useful.
Andrew:
Schema, to make sure that systems can decode that data, if it's been encoded in some format.
Andrew:
Or just to configure a platform to conform, to ensure that the data conforms to the schema.
Andrew:
And then for alerts, or route data to the data queue, or whatever you want to do with a database, invalid.
Andrew:
You also want to go into things like data quality rules.
Andrew:
So, this record should be between 0 and 1.
Andrew:
It should be a float between 0 and 1.
Andrew:
That's beyond what schema can do, which is float.
Andrew:
You're now looking at the actual record itself, and investigating that, and detecting records with inverse bounds, and then sending an alert.
Andrew:
So, that's something you might do in order to see the record.
Andrew:
So, yeah, these are more things.
Andrew:
And a lot of these things you can do, for example, in OpenAPI.
Andrew:
So, a lot of these people do apply to APIs, and they're using, and they're creating integrations by APIs.
Andrew:
And that's a big example for their contracts.
Andrew:
So, a lot of it came from there.
Andrew:
But those, the kind of things you have in OpenAPI, which can do schema checks, and can do validation, are the kind of things you would use as data, attached to data, like you said.
Andrew:
As data's moving around, you might use this metadata to run checks, and alerts, route data to places, whatever it is you want to do with data as it's flowing through, the flow between systems.
Rachel:
I think it kind of comes into the testing side of things as well, and how that integration pipeline will have different stages to it, and you'll want to make sure you've got the quality gates right at each stage.
Rachel:
The details of that should be in your data contract as well.
Karol:
Yeah.
Karol:
But it's interesting, from my perspective as an application integration specialist, a lot of the metadata that you're mentioning, that you would attach to an instance of a data product, I don't see that in application integration at all.
Karol:
Because a lot of that metadata is actually created within the integration platform, not created from the source system per se, but from the integration platform.
Karol:
Especially if I have an integration platform
Karol:
that is in the API-led architecture, which means that every system that initiates communication
Karol:
towards the integration platform has its dedicated channel, and a dedicated API used only by that
Karol:
system and nobody else, then at this point, when the request or the event hits that API,
Karol:
I create a large set of metadata, correlators, origin, lots of things that I then basically
Karol:
transform into log data and push that to an aggregator to consume.
Karol:
So I have correlation through the whole integration platform, all the applications that facilitate integration, and I can see that over whatever visualisation, like let's say we're using Elastic Stack, so I have in a Kibana dashboard somewhere that I can track via that correlation ID or via a specific name of a system that's the client system, what is the communication.
Karol:
And a lot of that metadata does not need to be then produced by said system.
Karol:
But then, of course, there are pathologies to that where we expose a field to fill in the correlation ID by the system so that we have a correlation end-to-end instead of only across the integration platform for visibility, and then that system always inputs into that field a static text, static meaning hard-coded, not generated as a unique element.
Karol:
And that forces, for example, integration developers knowing that to create an additional correlation ID field that is then populated by the integration platform.
Karol:
So greater metadata, but sometimes the collaboration there lacks in that sense.
Karol:
But in general, if I look at the integration contracts that we're doing, we're describing the payload section with the actual data product.
Karol:
And that often contains a lot of the metadata you mentioned around the data product, but we treat it as business payload.
Karol:
We don't treat it as something that is metadata for us when we're doing application integration.
Karol:
That has no meaning.
Karol:
For us, that's like, yeah, that's payload.
Karol:
Move it along.
Karol:
We parse it as we use it.
Rachel:
The payload is one part of the data contract, absolutely.
Rachel:
But what Andrew has mostly been talking about is the bits that are set up by the business in the first instance.
Rachel:
So they essentially become the requirements for what then needs to be built.
Rachel:
And then it has to stay alive, and it has to make sure that it's up-to-date constantly.
Rachel:
So some of that will be auto-generated, right?
Rachel:
It will be that payload piece.
Rachel:
So that plays a part in it, but it's probably not the interesting bit when we're looking at the design phase, because that's what we're looking for somebody to input, because we're driving, we're building a product, we're driving from a consumer requirement.
Rachel:
So therefore, the things we're talking about are, well, what data do you need?
Rachel:
Where is it coming from?
Rachel:
Where do you want to land it?
Rachel:
What format do you need to access it in?
Rachel:
If it's coming from a third party, okay, it's not available on Christmas Day, Boxing Day, New Year's Day, what do you want to happen?
Rachel:
So it's all the business requirements as well.
Rachel:
But that's where, if you look at Andrew's book, it's not a leaflet.
Rachel:
It's quite long, because the contracts incorporate quite a lot of stuff.
Rachel:
As you say, you can put anything in it and then look at how you're going to use it.
Rachel:
So I would say what you are talking about is can, and is very much incorporated in the contracts for a lot of people.
Rachel:
But there's a different way of using it as well, in terms of the design setup monitoring of what's actually going on, being built and then ongoing.
Andrew:
Yeah, yeah, I think that's right.
Andrew:
Also, I think we're using the term metadata, it's quite a broad term.
Andrew:
So the stuff you're talking about, Karol, was, yeah, for example, we can monitor the latency of an event, and we generate a number about how old time that was, that is metadata.
Andrew:
And you might refer to that in the Kibana chart or Prometheus or something, or whatever it might be.
Andrew:
So that is, arguably, that's metadata.
Andrew:
That is a bit of data that describes the event.
Andrew:
So metadata is quite a broad definition.
Andrew:
And that sort of stuff wouldn't be in a data contract, neither would it be in an integration contract.
Andrew:
It's just metadata that was created, updated, moved between systems.
Andrew:
So we still have all of that in a data platform, as you would in an integration platform, or at least we do, because we built that platform in-house and we have the ability to do that.
Andrew:
And that is metadata.
Andrew:
But I think when we talk about data contracts and metadata within that, like Rachel was saying, that's generally data that comes from people describing how they expect this data product to behave, or what should be in it, or how it should perform.
Andrew:
So it's generally from the business, like Rachel was saying.
Andrew:
So I know of the definition of metadata, I think, in data contracts, and just all metadata that might be associated with data moving around.
Karol:
I see this is where the disconnect here happened.
Karol:
We were on two different levels of abstractions there.
Karol:
And that metadata, what you were describing, it's nowhere near the level of abstraction I'm dealing with in terms of application integration.
Karol:
But if I jump to the business process level, then I would be seeing that metadata there, definitely.
Karol:
But for me as an integration architect, I'm very rarely at that level, or I'm very rarely invited to participate at that level of a conversation, which is a shame.
Rachel:
It goes back to you having the information that you need and the bit that interests you.
Rachel:
It doesn't mean that you shouldn't have the rest of the data.
Rachel:
The concept of data contracts is not specific just to data mesh, though, and can be used across the operational space.
Rachel:
So it should still be there.
Karol:
Should.
Rachel:
Go and ask people.
Rachel:
Can you see it?
Karol:
Like I said at the beginning, I can do application integration without the data architect.
Karol:
So same way I could do application integration without a business analyst.
Karol:
Would that be a good quality of an integration?
Karol:
Probably not.
Karol:
I would have to double down as a data architect and a business analyst in that context.
Karol:
Would I do a good job at that?
Karol:
Maybe, maybe not.
Karol:
I wouldn't be so sure about my skills in that regard because I'm not trained to do business analysis.
Karol:
I'm not trained to do data architecture.
Karol:
But this is a level of abstraction that would be good to have.
Karol:
This is what I love about domain-driven design.
Karol:
We're bringing everybody in the room, no matter what level of abstraction, and then we're laying out this in a ubiquitous language, and we're building that understanding.
Karol:
So then we're having an understanding of what metadata is on each of those levels.
Karol:
And then as we go down to implementation, then we have a more refined metadata of a more specific object or a more specific smaller piece of thing because an instance of data is really very specific.
Karol:
When we're talking about the business process, that's very vague in that sense.
Karol:
And again, different scope, different abstraction layer, different metadata in that sense.
Karol:
And maybe less relevant for me from an application integration perspective and designing that interoperability.
Karol:
But then again, something that I learned also recently about managing also requirements, something that might be a requirement for me, for other people might be a result of them working with requirements.
Karol:
And again, that's the jump through those abstraction layers.
Karol:
Every time I go down an abstraction layer, what was the result of the work on that abstraction layer becomes the requirements on this abstraction layer.
Karol:
And further down, the same thing happens with another result in this cascading effect.
Karol:
And I think this exactly happened here in this discussion about those metadata, that we have this cascading effect where what you say is metadata for me turns into functional and non-functional requirements towards interoperability design.
Karol:
That is not metadata anymore.
Karol:
That's something that I action in my work, in design, and then later in implementation as a realisation of that design.
Karol:
It's an interesting observation, like really live that, oh, we were supposed to talk about the disconnect in the language.
Karol:
We actually stumbled right into it.
Karol:
There we go.
Rachel:
There you go.
Rachel:
Live demonstrations, they don't usually work, but this one was teed up specially.
Andrew:
Yeah.
Andrew:
It's really interesting, isn't it, how we've pulled into that.
Andrew:
But I think it's also interesting how the day contract is kind of riding from the top, like catching business requirements, and then all the way down to how it's been, how we use it, it's really used, driving actions in the platform we're building.
Andrew:
Like I described also a contract-driven platform where almost everything in the platform is driven by the contract.
Andrew:
So you describe the data, and the platform will automate governance, the platform will automate access management, the platform will automate the quality checks and the SLO reporting and things like that.
Andrew:
So it's interesting, yeah, how the day contract is kind of, it meets different needs for different people at different layers, and right down to, I operate more my day-to-day job, really, more like you do, or like the actual moving things around, or building a platform that allows people to move things around.
Andrew:
And there, I'm very much like, what if this contract can use to take action to build to deploy the infrastructure they need, to set up backups on the cage and space just they've defined, or they didn't find, give a sensible default, and start looking into data retention systems so we can complete duty card solutions and do things like that.
Andrew:
So we're doing it all from the actions, but also up to the top, where I operate more
Andrew:
these days as well, but as long as we make sure we spend a lot of time capturing the business
Andrew:
requirements, making sure they're captured, making sure they're kept up-to-date as data
Andrew:
evolves, they're not forgotten about, hopefully they're stored with the data, and maybe there
Andrew:
are things in there that aren't directly actionable, things like describing the data,
Andrew:
things like ownership, and hours of business hours of support, which might not necessarily
Andrew:
take any action in the platform, but it's very important for someone using the data to decide
Andrew:
how they can use it, can they rely on it 24-7, or only 9-5, and that makes a difference to how
Andrew:
they use it, and that's a human decision, not something you do in the platform.
Andrew:
So yeah, interesting how this conversation led us here, and interesting how broadly the contract has been used up and down the stack.
Karol:
And now looking at the metadata being ownership, for example, right?
Karol:
Again, if you're looking at ownership from a perspective of a data contract on the business level, you have a business unit, or somebody in the business, like literally a person, right?
Karol:
Being the owner of said data.
Karol:
And then when I come down to the architecture level or the implementation level, is that the same owner?
Rachel:
We got it in again!
Karol:
Yeah, but you may have the same metadata type, owner, or ownership, right?
Karol:
But is it the same value there?
Karol:
No, because in architecture, I'm coming down to the relevant systems that are communicating, that hold that data, that store that data, and the owner is basically somebody that's taking care of that particular system.
Karol:
But then if I go down to...
Rachel:
Again, it depends on what you're going to be building, so absolutely, but having that ownership hierarchy is really, really important.
Rachel:
I believe there was a really good talk at Data Mesh Live last year about it.
Rachel:
It wasn't recorded.
Karol:
By the way, Data Mesh Live next year, I've been politely asked by organisers to just pop in a lovely QR here.
Karol:
Fantastic conference.
Karol:
Haven't attended much of talks in Data Mesh, but that was next door doing domain driven design with wonderful people, wonderful conversations alongside, outside of the specific talks.
Karol:
So, Belgium, Antwerp next year in June.
Karol:
I'll probably be there, probably again attending the DDP Europe instead, but maybe this time I'll get the all-in ticket.
Rachel:
We're getting those crossovers.
Rachel:
This is the great thing about these two conferences being co-located.
Rachel:
It allows us to start having these conversations and bringing these cross-cutting concerns together rather than having it siloed.
Karol:
I mean, who knows?
Karol:
Maybe we'll turn this into a talk for DDP or Data Mesh just to crossover.
Rachel:
It's not a bad idea, guys.
Karol:
That would definitely be something because just stumbling on stage onto those language
Karol:
problems and understanding semantic understanding, for example, of metadata or what are metadata
Karol:
contracts and how somebody else understands them might just show the community
Karol:
how problematic this actually is and building that common language and a ubiquitous language
Karol:
and understanding of what we're doing on what abstraction level.
Rachel:
There is a huge crossover as well.
Rachel:
Obviously, coming more from the operational, the software engineering side, I had the experience of DDD, so it's interesting stepping back and looking at that.
Rachel:
We used a lot of the techniques, the collaborative modelling techniques in order to establish what our domains are, what our data products are.
Rachel:
It's the same kind of approaches.
Rachel:
There's not that much difference.
Rachel:
We should be bringing it all together.
Karol:
From a perspective of, let's say, event-storming and then cutting out bounded context out of the event-storming session, I mean, does that really differ that much?
Karol:
I don't think it differs methodically.
Karol:
The focus is instead of on the process.
Rachel:
Event-storming is where I started with all of this and our Data Mesh journey started with a lot of event-storming sessions with the business.
Karol:
Alberto, come to our live stream.
Karol:
Maybe he'll be watching someday.
Karol:
I don't think he's watching right now, but it would be fun to talk to Alberto about event-storming.
Karol:
That's a completely different topic.
Karol:
These are analogous because if we talk about operational systems and we cut out the bounded context and the subdomains, domains of those systems, then this is most of the times one-to-one to data architecture and then the domain partitioning of data.
Karol:
That is a very relevant exercise to just cross over from Data Mesh to DDD and back again to take those learnings and apply them in that field.
Karol:
I know that I'm taking from application integration, I'm taking a lot from DDD because we can also apply the main partitioning within data movements for a change.
Karol:
We don't have to persist anything to apply domain boundaries and bound context.
Karol:
Oh, and context mapping that get me started there, I spent over three hours explaining how that works in integration platforms to Philip, my co-host at the workshop.
Karol:
He couldn't believe me that this is actually a realistic thing that happens in integration platforms, but there we are.
Karol:
Took DDD, applied it to integration platforms.
Karol:
It's like, wow, a whole new world out there.
Karol:
Crossing those boundaries and especially when three conferences at the same time are in the same space, these are some amazing conversations to be had there with people from Infocentric, from Data Mesh, from DDD.
Karol:
There's a really interesting mix there.
Karol:
Very much worthwhile attending.
Karol:
We've all been there.
Karol:
It's quite an interesting thing, especially in the evening when you just sit down in the lobby and talk with people over your experiences and the differences in your experiences, and there are plenty.
Rachel:
And the similarities.
Karol:
And similarities as well, yes.
Karol:
But then, talking about differences, I think this is maybe the last topic for today, given that we've been talking for nearly two hours now.
Karol:
Organisational barriers.
Karol:
Rachel, I know that you're quite opinionated on that.
Karol:
You had a very strong statement about technology being a barrier or not in an organisation.
Karol:
Hit us with this last pieces of wisdom for the end of the stream before we end.
Karol:
Let's dive into it.
Rachel:
It kind of ties in probably a little bit with what we were talking about earlier on.
Rachel:
So, it's one of the great things about kind of like being an architect and working with people both on the technical side and the business side and across multiple different projects is you get some interesting opinions expressed.
Rachel:
And one of those kind of common opinions is, do you know what, what's the technology going to allow us to do?
Rachel:
And there was a beautiful moment where I wrote on a whiteboard, because, hey, I love whiteboards, I'm an architect.
Rachel:
Technology is not a blocker, and I put my name to it.
Rachel:
They took a picture, and this gets shown to me every kind of incident that happens.
Rachel:
And I'm like, yeah, but is technology the blocker?
Rachel:
Or is it something or someone in the organisation?
Rachel:
Is it the lack of design thinking that's gone into something?
Rachel:
Is it the lack of product thinking?
Rachel:
Is it the inability to be able to evolve with modern techniques?
Rachel:
Every time it has been something on the people in the organisation side, rather than the technology.
Rachel:
And that's also one of the great things about data mesh being technology agnostic.
Rachel:
You need good technology to implement it well, but you can absolutely have product and design thinking and put the structure and the metadata around data product without having any amazing technology in place.
Rachel:
So, it's an interesting approach and challenge to get people on the business side, their expectations and being able to drive things with that understanding that technology is not the blocker.
Karol:
I mean, I can definitely get behind that.
Rachel:
Yeah, that's a great statement that Stefan's just put up there.
Rachel:
It's the lack of accountability and willingness to take to it.
Rachel:
It's also the lack of decisiveness and decision making that goes with that as well, I would say.
Karol:
Yeah.
Karol:
A lot of the times, yes.
Karol:
Or just not willingness to see the problem or the sheer will to ignore the problem when everybody's flagging the problem all over.
Karol:
There's so many in terms of, again, tying that down to Andrew Harmel-Law talk at DDD Europe, we're dealing with the second biggest problem in software architecture.
Karol:
Yeah, we can deal with it because we can solve it by logic, by design, by working around it and implementing that in a very specific way.
Karol:
Are we going to solve the first problem?
Rachel:
Not until we get people architects.
Karol:
Not until we get people architects, yeah.
Karol:
And that problem is basically everywhere.
Karol:
The organisation, the people are the problem, in essence, because we're different.
Karol:
We have different opinions, different agendas, different things to do.
Rachel:
And we're pulling different languages that makes communication hard, understanding hard.
Karol:
Different dictionaries or different architecture, not architecture, different opinions over what a specific role should facilitate.
Karol:
Like, for example, I heard that an architect should not advise on organisational changes or the changes in the skill sets of people in teams.
Karol:
I mean, I can not do it, but would that be of benefit?
Karol:
And, again, that's not a technical issue.
Karol:
Tech is not the problem.
Karol:
It's the people, the organisation, the processes, often without consideration for what we're trying to achieve.
Rachel:
If you think of one of the big common reasons for people to change their data architecture, whether it's data measure or anything else that they want to pick up, it's about providing access to data.
Rachel:
There is no point in providing access to data if you don't have people who understand what to do with it and what they're looking at when it's provided.
Rachel:
So as an architect, I certainly have a very keen interest in raising the data literacy in our organisation, and I would be very offended if people said that wasn't part of my role or part of my interest, because there's no point in me doing my job if it's not providing business value at the end of it, and there's no business value if people can't use the data.
Karol:
Kind of feel the same with literacy over moving data and utilising that in a manner that is aligned with the business.
Karol:
Same thing, different scope of field.
Andrew:
Andrew?
Andrew:
I agree with all that.
Andrew:
Sometimes people will talk about, they say it's politics, and
Andrew:
try and use it as a term that dismisses all of this, like politics, I can't do that, but
Andrew:
it's politics just what happens when people get together and have a different view,
Andrew:
different backgrounds, different experiences, and maybe everyone's a people architect,
Andrew:
all architects are people architects first, I think, and subject matter is almost secondary.
Andrew:
You have to think about data mesh as well, it's always about the social, technological thing, it's always about the communication first, really.
Andrew:
It's great for machine applications,
Andrew:
that's always about why we're doing this, what's the point, so communication with humans,
Andrew:
it's the most important thing, and actually, the way I've grown over the last five or six
Andrew:
years I've been doing this, I was an engineer, leading the data platform team, eventually
Andrew:
started getting involved in architecture, and the idea I had, the hardest part was the
Andrew:
communication, the platform was really easy, it was trivial almost, the hardest part was
Andrew:
communication, getting people into this idea of doing data differently, I was lucky I had a really
Andrew:
great product manager at the time, she helped me a lot to learn new skills, but now, if I didn't
Andrew:
learn my skills, I wouldn't be in the position I'm in now, it's the most important part,
Andrew:
you can't get any of this done, it's the hardest part, the most important part.
Karol:
I think if I haven't worked on my people skills, I would be still sitting in a cubicle somewhere, and we wouldn't be having this conversation today!
Andrew:
That's fine, some people want to just do the code, but if you want to become a senior,
Andrew:
you have to be, you have to improve people's skills, and if you want to have great impact,
Andrew:
then just write code you can write in a day, you have to work with people, you have to have that
Andrew:
impact, and the more you can do that, the greater impact you can have, the higher up you get in your
Andrew:
conversation, if that's your goal.
Karol:
So parting words with Baden, if you want to do anything, invest in being a people architect, all the technical domains, technical architecture stuff is fun, but people architect, that's the way to go, try that and be the people architect.
Karol:
All right, before we finish, before we go, I'm going to just switch into a one short slide for a moment to chop the feed.
Karol:
So coming up next from Loosely Coupled, that's in September, we're going to have a little bit
Karol:
of a break, the next week there is no live stream, so we're having federated API management,
Karol:
another technical topic this time around, and we're going to be having Adib Tahir from API
Karol:
WIS, which is an API management federation solution, and we're going to have a bit of a
Karol:
technical discussion of what is that problem, I think that's going to be quite an interesting
Karol:
one.
Karol:
Now, if you manage to survive with us, or watching the recording on YouTube or LinkedIn, feel free to just scan the QR code, visit Bridging the Gap, read more, in the menu there's a tab called Upcoming Events, you'll be able to see all the events past and future from Bridging the Gap, including Loosely Coupled and all the visits of conferences and meetups.
Karol:
Subscribe to our
Karol:
sub-stack, we publish articles, they're fairly technical, we're having a bit of a looser approach
Karol:
here on the stream, and subscribe to the YouTube channel, it would be fun to have you visit the
Karol:
live stream as they are, so you get the notifications that those live streams are coming up,
Karol:
or just follow me on LinkedIn, then you'll also get some information about
Karol:
the live streams, because all of them are on LinkedIn, of course.
Karol:
That said, Rachel, Andrew,
Karol:
thank you for joining, it was a pleasure discussing and stumbling into our communication
Karol:
language problems, stumbling into those exact problems that we're here to address,
Karol:
that was a fun mishap to be had, great topic, and there's a lot more to explore it seems,
Karol:
at least for me, I learned a lot, I learned that I, again, don't know everything,
Karol:
not the wisest person in the room, thank you for that, thank you for humbling me, and yeah, parting
Karol:
words.
Rachel:
Thank you for having us, really, I've imparted all the wisdom I have, if you can call it that, and I'm looking forward to tuning in to your next live stream.
Andrew:
Yeah, it's been really enjoyable, thank you for having us, I think really, yeah, really great discussion, thank you.
Karol:
All right, all right, thank you, Dan, and good night, everybody, or, well, whatever the time of the day is that you're watching this.