Loosely Coupled - EDA to Power Warehouse Systems

Show full transcript

Karol:

Good morning, good afternoon, good evening, everybody.

Karol:

Welcome to Loosely Coupled, brought to you by Bridging the App.

Karol:

My name is Karol Skrzymowski and I'll be your host tonight.

Karol:

And today we're having quite an interesting topic to tackle.

Karol:

We're going to be talking about event-driven architecture, but in a very different edition than the usual, which means that we're going to be talking about an architecture that actually touches the physical world.

Karol:

Normally, we don't really get to see that.

Karol:

We see just, you know, pixels on the screen, etc.

Karol:

Today, we're having a very interesting instalment of having event-driven architecture in a warehouse.

Karol:

And to tell you all about that, we're going to be having Valeriia Filimonova, a staff engineer from Picnic Technologies, a FMCG company in the Netherlands.

Karol:

Now, Valeria is a person passionate about software architecture, clean code, and distributed systems.

Karol:

Also, the expert in the company on RabbitMQ.

Karol:

And she loves designing maintainable systems with a focus on EDA.

Karol:

So, that's exactly what we're talking about today.

Karol:

So, Valeria, welcome to the stream.

Valeri:

Hey, hello, hello.

Valeri:

Good morning, good evening, good afternoon.

Karol:

I see you're adopting my welcome message.

Valeri:

Yeah, it also makes me feel a bit better, because behind my window, it's quite dark, and I was thinking that someone has a morning and probably sunshine.

Valeri:

That's a nice thought.

Karol:

We sometimes get people watching from the US.

Karol:

So, you know, East Coast, that's a six-hour stiff, so they're in the middle of the day.

Karol:

And West Coast, it would be, what, 10 a.m. on that side of the world.

Karol:

So, sometimes it's morning.

Karol:

Like they say, there's always noon somewhere, right, if somebody's, well, enjoying other drinks than coffee at this hour.

Karol:

Valeria, tell us a little bit about yourself.

Karol:

What do you do?

Karol:

How did you get to the place you are?

Karol:

How did you get to do even driven architecture, etcétera?

Valeri:

Yeah, well, I'm very willing to give you a big story worth a fully book, but it's a pretty boring one, like all the way from the university in software engineering till that days.

Valeri:

So, I'll just skip to the part how I ended up being in the Netherlands, because I have a non-Dutch name, so for people who might know.

Valeri:

Like five years ago, I moved here, exactly at the point of time when Technica Technologies, the company I'm working in, has been starting a new automated warehouse.

Valeri:

So, I originally joined to build the software that's going to follow this huge, fancy thing.

Valeri:

And I loved it so much, exactly because of the reason you mentioned on the opening, that you write a line of code, and it's not just pixels, it's a physical thing, you can see things move, and you're like, whoa, that's something new.

Valeri:

Because before I've been working on microservices, there's also events, but for lawyers, that's not that exciting, to be honest.

Valeri:

So, and now having those fancy things around, moving, doing stuff, like, yeah.

Valeri:

So, I basically stuck with the thing, and seeing the baby growing from that very first early stage to a bigger, more complex system now, it's just irresistible.

Karol:

Okay.

Karol:

I mean, I remember my times at university, where I spent hours in the robotics lab.

Karol:

We had several sets of Lego Mindstorms, and we used pseudocode, or NXC, so not exactly C, to programme those, and that was, like, the most fun thing I could do after study hours, when I just sat in the lab, took Lego blocks, the Lego Technics blocks, and assembled them, and then programmed the controller to make a move and do things.

Karol:

That was, like, so satisfying.

Karol:

So, I guess that this kind of thrill of satisfaction kind of translates here, on a completely different scale.

Valeri:

I honestly didn't know that I would end up with such a cool project with this my first and only experience with the real technical and physical world was also, like, in the university, during master's, when I was writing my thesis, and it just felt fun, squished in a full microservices event, driven into a small controller to make a robot that you can extend and talk to, because why not, right?

Valeri:

Sure.

Karol:

Yeah.

Karol:

Okay.

Karol:

So, background, university, from university, EDA, microservices, events, all that since university?

Valeri:

Yeah, pretty much.

Valeri:

Well, in that regard, I was happy, because, like, I'm lucky, in a sense, because the first company I joined during the studies was also pretty modern already, and I've learned a lot of good stuff, but mostly how not to do microservices.

Valeri:

So now I'm trying not to repeat what I already know.

Valeri:

Yeah, and what I do in Picnic, I'm a staff engineer, so I'm not already, strictly speaking, belong to a house systems loan, so now I work across the company with any kind of cool projects.

Karol:

Okay.

Valeri:

So, always a lot of fun things to do.

Karol:

So, because titles differ from company to company, what does, in the Picnic hierarchy, what does this entail?

Karol:

Staff engineer, because we have different titles everywhere.

Karol:

I sometimes see that staff engineer is, like, one below principal engineer, whatever that means in general, and, yeah.

Valeri:

Yeah, staff engineer and title is somewhat new in the company.

Valeri:

We kicked off it, actually, beginning of this year, like, officially, and the best way to explain it, so we do have levels, pretty classical from junior to senior, but then when you reach a certain seniority level and you want to grow, you have three posts to grow.

Valeri:

Tech leadership is one of them, like, you run your team, blah, blah, nothing new.

Valeri:

Then you can simply skill up as an individual contributor more and more, or you can go staff engineering way, meaning that you don't belong to a product, to a team, so you're basically a unit of acceleration within the company that can do whatever.

Valeri:

If someone needs help with architecture, here's staff engineer for you.

Valeri:

If you have problem with, I don't know, migration or durability, AI, whatever, here's staff engineer for you.

Karol:

Right, so that's more of a broader role outside of specific domain areas and specific projects.

Karol:

Okay, that kind of approach.

Karol:

All right, and that differs from the staff engineers I used to be exposed to because these were usually specialised in an area, so that's like, again, different companies, different definitions of such role, same with architecture.

Valeri:

Definitely, but you still inevitably have some speciality for people, right?

Valeri:

Even though we are supposed to be trained, like, in breadth of knowledge, you still gravitate towards certain things, and for me, like, event-driven microservices, I inevitably go that way.

Karol:

Just out of curiosity, do you have, like, an integration platform also in Picnic?

Karol:

No, so even driven all the way, or point-to-point between systems?

Karol:

Yeah.

Karol:

Okay, okay.

Karol:

Just out of curiosity, I don't know what the scale of the systems are in Picnic in general, in the ecosystem, so it may work, may not, and you may have different problems or not known problems at all.

Karol:

That's always interesting to ask.

Karol:

All right, but today's topic, I'm quite curious.

Karol:

Event-driven in a warehouse system.

Karol:

I mean, I've seen a warehouse once in my lifetime, in my career, when I was working back at T-Mobile, but that was a completely different day and age, and we didn't have microservices yet.

Karol:

That was so long ago.

Karol:

I mean, we didn't even have cloud at that point yet.

Karol:

Yeah, that was that long ago, yes.

Karol:

So, all the things that we did was in on-premise server rooms.

Karol:

We didn't even have proper virtualisation at the time for servers, so some of the servers were, like, still physical machines that we just connected to over the network instead of just virtual machines, so that was a long time ago.

Karol:

Probably from that time, quite a bit has changed, so what's going on there?

Karol:

What's the deal with having the EDA in a warehouse?

Karol:

Because I think that's a, for some viewers, that might be something very interesting and specific, and like you said and I said, then pixels on the screen, right?

Karol:

For most of us, when we deal with architecture, when we deal with coding, that's pixels on the screen.

Karol:

We get a front-end, we do something at the front-end, etc.

Karol:

What does it entail to have an EDA-like system in a warehouse?

Valeri:

Well, first of all, it's quite natural because it's, for instance, a domain where an event is not

Valeri:

an abstract thing but an actual physical thing happening in the physical world, so the way you

Valeri:

would see it's like those long, long stretches of conveyors, very physical, boxes moving around,

Valeri:

actually we call them crates, and then on distance from each other, there are scanners,

Valeri:

so whenever the crate passes by, the barcode is scanned, and this is a natural event that you

Valeri:

would get from the physical machinery, and now you know, hey, this particular crate, this number

Valeri:

is at this point in your physical warehouse, and that's one type of event, and the other is, of

Valeri:

course, you can say, hey, I want this particular crate, like five kilometres from here in a

Valeri:

different part of the system.

Karol:

So, is the warehouse that big that you can get it in kilometres?

Valeri:

Yes, you do have kilometres of conveyors, but of course they are not just stretched, that would be an efficient use of space, they are stacked on top of each other, right?

Karol:

So, you can say, hey, yeah, so if you would estimate how many kilometres of conveyors you have in a warehouse at this point?

Karol:

Educated guess.

Valeri:

Yeah, I don't want to lie, I think one of the biggest we have in Germany should have at least 20 kilometres, yes, and being at three by now, so multiply, and the fourth one is coming, so in total you have already quite some.

Karol:

And that's condensed into, of course, a 3D space, so they're not outlined in a 2D like flat surface, but they're like stacked on top of each other.

Valeri:

Yes, like in general, like the full automated warehouse at least three storeys high.

Karol:

Okay, that's a lot.

Karol:

So, in that sense, we have a warehouse, we have conveyor belts, we have scanners on those conveyor belts, the natural event is the position of the crate, the box on the conveyor belt.

Valeri:

Yeah.

Karol:

Okay, the next, let's say, event is like telling where you want that box, and what does that entail from this process perspective?

Valeri:

Well, what's good to understand here is the system is layered, so the bottom one layer is basically what controls the conveyors themselves, it's like the lowest possible machinery level.

Valeri:

Then on top of that, the hardware providers give you a bit of a better abstraction, which has different names, but we refer to it as a transport system.

Valeri:

So, that's what we normally would talk to from the business perspective, because that already gives you an abstraction of exactly, so my crate is here, or if I get commands to get delivery there, this is how it executes it.

Karol:

So, is it like that the belts actually fork and join at certain times?

Valeri:

Yes.

Valeri:

Oh, wow.

Karol:

Okay, that would be fun.

Karol:

My brain is opening now Factorio, you know, the game with lots of conveyor belts in it, and I'm looking at this and my brain is like, oh, the conveyor is going, joining, splitting, whatnot.

Karol:

Okay, and from continuing, sorry.

Valeri:

Yeah, well, but actually that's a good transition, because it's quite simple just to think that, oh, I have a scan and here is my crate, or I give a command to send it there, it's delivered, but it's never that good in real life, it's way more complex.

Karol:

Okay.

Valeri:

If you have a fork, it means that at any point of time machinery can fail, and either won't be able to scan, barcode can be damaged, or malfunction of a scanner, or actually the crate can misdivert and go the other way around.

Valeri:

You want it left, and it went right, and you'll have to deal with that.

Karol:

And reroute them, in that sense.

Valeri:

Yeah, reroutes are not always possible, because here we actually, we are working with different hardware providers, and even though the conveyor belts are produced by the same company, this SAC transporting is proprietary to those companies, and one of them has a limitation.

Valeri:

So the warehouse is so big, that is, from the transport system perspective, is split into smaller units.

Valeri:

So basically, different microservices, deployables, whatever you want to call it, and the thing is that you can't send a request for a crate to breach the boundary, in the sense that if your crate is in the middle of one transport system, and you want it in the other one, you can't send a direct transport.

Valeri:

You have to go all the way to the border, then it will naturally cross, and only then it appears in the new system.

Valeri:

And that leads to limitations to how you can design your transportation, and you will have to handle a lot of those cases.

Karol:

So okay, let me put it like my brain tries to imagine the warehouse.

Karol:

So conveyor belts, scanners scanning for the position, so obviously you probably have a scanner just before a fork, for example, right, to identify, oh, this crate is now at the fork, and then based on that, you switch the fork towards one of the conveyors, right?

Karol:

And then that moves on, and then do you have traffic jams in those conveyor belts at times?

Karol:

Yes, you do.

Karol:

Oh, wow, okay.

Karol:

That sounds like one of the warehouses I visited back in Poland, in working for T-Mobile, that at times they also experienced traffic jams, but that was a lot less automation in that sense.

Karol:

Most of them are while the conveyors were moving by electric engines, but most of that was manually clicking and then move the conveyor, right?

Karol:

A good question popped up in the meantime on YouTube.

Karol:

Do all warehouses have the same implementation, as in, actually the same systems and the same physical components, I guess, or do you like customise?

Valeri:

Yeah, honestly, I would have loved to say yes, but of course not.

Valeri:

So all the three warehouses that are actually running, automated ones, they are different.

Valeri:

The fourth one that's coming is also different.

Valeri:

It's first that our hardware providers are different, so this transport system layer works differently, and that actually bridges us towards what was the requirements and how we ended up with event-driven all the way through, just because we want interoperability.

Valeri:

So now you need to work with multiple ways to control the hardware, and also you want to plug in different pieces at different points of time.

Valeri:

What I mean is that a fully automated warehouse has a tonne of conveyors, various special stations, fully automated, big storage, like imagine a bookshelf is thousands of places to put a crate into, but the neighbour does have conveyors and some types of stations, but doesn't have the storage, and a lot of different, like, robot parts, and the third one has something else.

Valeri:

So the way we've been designing the system is so that any new warehouse coming in, you can simply choose the parts you need, plug them in together, and make them work.

Karol:

So it's like building with Lego blocks.

Valeri:

Exactly.

Karol:

A physical and digital Lego blocks.

Valeri:

Exactly.

Karol:

So you have different hardware suppliers for warehouses, so different physical implementation, and then you plug and apply to that, based on their specification of interfaces, let's say somewhat generic microservices doing the same things, controlling the environment with that minimal adaptation towards that specific interface?

Valeri:

Kind of.

Valeri:

So what happens, we, on our side, so our part of the deal, we split the microservices, abstractly speaking, into two buckets, and there are those kind of executors, that's basically those services that compartmentalise a specific part of the business flows and the touched hardware.

Valeri:

Picking is a process when a crate is in front of you, one is product, one for the customer, and you make a simple motion of pick up the paper and move to another crate.

Karol:

Okay.

Valeri:

This specific one, and even this picking system already has three different implementations, because of the hardware providers.

Valeri:

There is also manual ones, where people manually walk, where there are no conveyors.

Valeri:

So there are a bunch of those executors, which you can basically say that for this warehouse, we do have picking, we do have automated gel packing, it's a process when you need to put gel pack, something frozen to keep your cool product cool all the way to the customer, and etc, etc.

Valeri:

There are a lot of them.

Valeri:

But then you have also even higher level, another bucket of services,

Valeri:

which are orchestrators, and those are invariably basically the business flows and the controls

Valeri:

over the executors, saying how to route things, how to basically, what are the stages of the

Valeri:

order processing should be, because from the moment you get your order from a customer,

Valeri:

depending on what kind of order you want to pick it, to gel pack, to dispatch, or maybe rework,

Valeri:

and it's like the whole world of decision trees, and how you do it, depending on all the

Valeri:

parameters.

Karol:

So from what I heard from you, in terms of the names of specific microservices, you do not do choreography, you do orchestration, in terms of the topology of EDA.

Valeri:

Okay.

Karol:

You're trying to.

Karol:

But this then comes to mind that there is actually another actor in the play, other than the conveyor belts with scanners, it's actually the automation that moves around the products, right?

Karol:

So, like, some sort of a robotic arms in that sense?

Valeri:

Yeah, like the majority of picking currently happens via a human, a real arm, but we're already installing a robotic arm in certain warehouses to try it out, because it definitely has its own benefits.

Valeri:

It doesn't need to be paid extra during weekends and out of office hours.

Valeri:

Okay, that's fair.

Valeri:

Yeah, but it also has limitations, obviously, not everything can be nicely and neatly picked by a robotic arm, and it's actually slower than it may be previous.

Karol:

Yeah, possibly.

Karol:

Yeah, that's possibly true.

Karol:

I remember from the days when I was still in my bachelor's studies, because I studied electrical engineering, I was fascinated in the robotics labs and also in general, working with automation, so PLC controls and these kind of things.

Karol:

There was a very known company back in the day that they actually designed robotic arms for FMCG, which those robotic arms, they showcased them, like those arms could literally go in and pick up an apple and move it without leaving any trace on the apple.

Karol:

So they were very sensitive in terms of the pressure, how much pressure they apply on the object.

Karol:

So that was already quite interesting back then.

Karol:

I have no idea how that industry developed, because I went into IT instead of robotics and automation.

Karol:

But I'm guessing this is the level of the automation in terms of robotic arms that you would be looking at in a warehouse so that you do not damage the goods.

Karol:

Exactly.

Karol:

Out of curiosity, in the meantime, I'll try to Google out the company and see if there are still some images of these kind of robotic arms so we can show it on screen.

Karol:

But the thing I'm interested in is to get, well, you're in PICNIC for about five years now, a bit more maybe.

Karol:

What was the original problem statement to start working on such a warehouse and working on an EDA system for warehouses?

Valeri:

Okay.

Valeri:

Backstory.

Valeri:

So for the first five years of PICNIC, obviously, the company was only starting.

Valeri:

And all of...

Valeri:

Oh, well, first, let me introduce the company.

Valeri:

I think that not everybody from the Netherlands knows what PICNIC is about.

Karol:

Some do and buy groceries for PICNIC, so...

Valeri:

Just in case.

Valeri:

It's an online grocery store.

Valeri:

So what does it mean is actually that there are no physical stores you can go to and pick things yourself.

Valeri:

So it's always delivered to your door.

Valeri:

And what that means is that you extra invest into supply chain and warehousing specifically, because you want to be extra efficient and productive on that side of things.

Valeri:

And we started small.

Valeri:

Most of the warehouses were manual.

Valeri:

People walking around, pushing the crates, moving things, heavy, hard, slow.

Valeri:

It's like insane numbers of how many kilometres per day single people would walk through the building and facility.

Valeri:

So obviously, the first moment we got, let's say, time and capacity, let's invest into automation, because that's the next very natural state.

Valeri:

And it's always been in the mind of the company and on the roadmap.

Valeri:

And that's exactly what happened five years ago.

Valeri:

That's when I was hired, exactly for the first automated warehouse to be built.

Valeri:

And the problem was simple, how we can be more efficient, how we can increase the UPH, basically, pick more products per unit of time.

Valeri:

As simple as that.

Valeri:

And of course, for the first place we've been building, going straight microservices would have been insane.

Valeri:

We didn't have experience whatsoever in the domain of robotics and automated processes.

Valeri:

So we decided to go small and just build a monolith side by side with the existing one.

Valeri:

So we had one for manual processes, one for automated.

Valeri:

But we already knew that it's not going to lead for long, because there are roadmaps through business appetites, we know we would want to grow.

Valeri:

So the way we approach monoliths, it's already was nicely decoupled.

Valeri:

And like picking, you already heard this word, like, gel packing, that's already like isolated.

Valeri:

And even within the monoliths been talking by news of events to reach exactly this decoupling that in future should have helped us to transition towards a more distributed scalable system.

Valeri:

And then with every new warehouse, we started splitting this big thing into smaller pieces.

Valeri:

And it was only natural that the communication between physical and our world is event driven.

Valeri:

Because it's physical event driven.

Valeri:

But then the communication between our services also went event driven way to decouple.

Valeri:

And also, the way we approach designing the system is that we invested a lot into the unhappy flows.

Valeri:

So basically, you have as we've been talking already today, anything can go wrong, the crate can go right instead of left, it can be stuck in the middle of the shelf, you can have a conjunction, whatever.

Valeri:

So basically, how can we make this system robust enough to deal with all those things.

Valeri:

And, and on top of that, that, okay, different hardware providers, we want to create our warehouses from those lego pieces and kind of very natural event driven way.

Karol:

Okay, so the problem statement was that actually the, I think one of the problems in the whole statement was that actually, because it's the physical world, a lot more things can go wrong very fast.

Karol:

And they're very specifically, so there's a lot more logic to be built around these kind of things.

Karol:

This pops out to me, because in general, when we, at least what I encounter when working with various types of architects, a lot of architects think, and business also, they think happy path only.

Karol:

So the design is happy path.

Karol:

And then when things go wrong, they start thinking about actual error handling.

Karol:

Whereas here, you can go wrong, and it can go wrong very bad, because it's physical.

Valeri:

Mm hmm.

Karol:

Right.

Karol:

So if you would give some examples of, apart from like, a crate being stuck, or routed the wrong conveyor belt, what other like, let's say, more spectacular wrongs can go, what can actually go into very wrong in a warehouse environment?

Valeri:

Well, the things that go wrong, we still can handle, is for instance, the whole area, just like a single piece of hardware, just switching off, and the whole area becoming unavailable.

Valeri:

So suddenly, you have to deal with the fact that now you have like, several hundreds of orders unavailable, you have to do something with that, either start anew, or reroute, like basically, disable the whole part and deal without that until it's fixed.

Valeri:

And you're kind of lucky if maintenance people can fix it within like an hour or two, sometimes it's a day.

Valeri:

Of course, they work as fast as possible.

Valeri:

But even an hour is a big loss in terms of money and efficiency.

Valeri:

So that's one example.

Valeri:

The other one, that's not something actually can easily deal with is like once we had the whole, like, because of the malfunctioning, all the freezers in the warehouse went just down.

Valeri:

So you can't deliver any frozen products, and it's all leaking.

Valeri:

But that's not something even any software can deal with, I think.

Valeri:

So you just deal with the fact that, okay, no frozen products today, and it is what it is.

Karol:

I mean, that must have been a horrendous site, all freezers out and not working.

Karol:

And if you have like, because, okay, I imagined myself that you probably have like a main path going for all the conveyors to like the exit to the actual shipping.

Karol:

Correct me if I'm wrong, there's like those, you mentioned those sections, like whole sections, isolated sections that you need to reach a border with Crate to move it to elsewhere.

Karol:

So you can have an outage of a section, or you can have an outage of the probably complete thing, probably rarer due to this compartmentalisation of elements in the warehouse.

Karol:

So if you have that kind of outage of the section, let's say it takes an hour for maintenance to clear it up.

Karol:

Is there like a workaround that you put in people in to handle things to do the offloading of the workflow?

Karol:

Or is that like, just stop entirely?

Valeri:

Neither actually.

Valeri:

Well, of the first stages, there were times where people have been fixing things, even like if you have a full conveyor and that cannot move because there are too many crates, someone come in and physically take off the crates to start the movement again, that happens.

Valeri:

But what happens these days, we obviously build some automation around that.

Valeri:

Yeah, you just register, okay, this whole piece of equipment is not available and the system deals with that.

Valeri:

So this propagates through all the necessary components on our side.

Valeri:

So that if routing needs to happen, we will bypass, we exclude this area.

Valeri:

So now we can work without that.

Valeri:

Everything that's been locked in these areas and how deprioritized or prioritised, depending.

Valeri:

So to the most part, the largest part, you don't need actually human intervention apart from the maintenance crew who fixes the actual hardware problem.

Karol:

Okay.

Karol:

Just to give you folks an idea how a robotic arm looks like for handling food, it's not the image I remember.

Karol:

This is from some sort of a expo show, but basically it's something built like that.

Karol:

So it's not like the industrial arm that people often see that they have the duty arms, but this is more of a delicate equipment.

Karol:

It will be mostly a lot slower because of the construction and it doesn't really grip with grips that are solid, but they envelop the object like this.

Karol:

So this is what kind of equipment is used by the company Festo or designed by the company Festo.

Karol:

And this is what I saw back in the day when I was in university.

Karol:

So probably I'm guessing similar kind of equipment in the warehouse?

Valeri:

Actually, I don't want to say it looks more advanced now.

Valeri:

Yes, it is.

Valeri:

Unfortunately, I can't share those particular pictures.

Valeri:

We are testing robot arms now and the provider, so the robotic arms, even enclosed them so that it's not shown yet exactly.

Karol:

But it looks pretty cool.

Karol:

Just to put it in the frame of reference, this video that I found is actually 13 years old, so it's more advanced.

Karol:

I mean, I saw it already a lot better over years that it was like a wireframe just moving a few pieces and that just goes like that.

Karol:

A good question around incidents again from YouTube.

Karol:

So how many incidents per year?

Karol:

That's an interesting one.

Karol:

How often do you actually get outages on certain sections or entire warehouses or what's the scope in that sense?

Karol:

Because from engineering perspective, it's an interesting question how things go wrong and how often things go wrong.

Karol:

So how often you need to compensate and then what kind of incidents are the most common ones?

Valeri:

Yeah, it's a good one.

Valeri:

I don't have a good overview, to be honest, but in the very abstract terms, like the big ones is like all freezers going down.

Valeri:

It's like hopefully one-time events, but of course not.

Valeri:

But that happened once so far and this scale of incidents you probably have once a year in the warehouse because they also have the regular maintenance shifts in terms of the replacement of tuning of different hardware parts.

Valeri:

So in that sense, the job is being done.

Valeri:

More often than not, it's us who breaks the system, not the hardware actually.

Karol:

So the IT breaks the system, okay.

Karol:

How does it happen though?

Karol:

It's like a faulty deployment of production?

Valeri:

Yeah, I mean, as I was saying, the story like five years ago, we started with monoliths, then we started breaking it apart, but you can imagine that three years is not enough to go completely from monoliths to microservices, such transition takes longer, especially when you need to support new warehouses, new features.

Valeri:

So we are still in the face of a migration and basically these kinds of migrations is quite challenging, especially on a side that already deals with thousands of orders per day and like serves half the country.

Karol:

Okay.

Valeri:

That's where it gets tricky to not break at all.

Valeri:

So we do kind of a consensus with the operations on when it is safe to deploy, especially if we know the changes are risky because there are always low days, depending on the demands and like Wednesdays, for instance, are normally low.

Valeri:

So that's the day we used to use the system.

Valeri:

But of course, we think a lot of, okay, if we roll it out and something goes wrong, if we roll back, is it enough?

Valeri:

What should be the steps to make sure we write as fast as possible?

Valeri:

The other way to break a system is actually not so long ago, I almost put down the Reddit cluster and you can imagine it's pretty central to the system.

Karol:

So it was like a beating heart of the system, basically.

Karol:

Exactly.

Valeri:

So it was like just tiny bit from crush.

Valeri:

So it didn't end up bad, but had all the possibilities to become one.

Karol:

Okay.

Karol:

So first let me unpack a little on the maintenance and operations side.

Karol:

So basically I assume that it kind of handles like an old school on-premise setup where you, at least where we were working in T-Mobile, we always deployed over weekends because all the systems, given that T-Mobile as a telco company was heavy duty technological without the IT systems that the business wouldn't work.

Karol:

That is the other way around with most companies because they're not relying on technology for the core of their business.

Karol:

Right.

Karol:

So here again, we have the same situation.

Karol:

You as in the warehouse setting, you rely on technology to actually facilitate the business, to do the business.

Karol:

Otherwise, if the technology is not working, you have a business stop.

Karol:

So when I was working for T-Mobile, it was like, we always deploy on a Friday evening and do all the checks and all the tests over a weekend because this is where our business is not working or on a Saturday evening, because this is where our business will not be working.

Karol:

And we're doing all the tests over a Saturday, Sunday, because these are the slow dates.

Karol:

That was the specificity of the telco industry that Monday to Friday, no changes on production allowed because this is where the heavy duty work is in office hours or in the shop working hours.

Karol:

And evenings, yes, bug fixes, no problem.

Karol:

Weekends, deploy full releases and test them.

Karol:

I'm guessing it's a similar setup in the warehouse, but it's actually probably not related to a weekend.

Karol:

I'm guessing weekends are actually the busier parts.

Valeri:

Yeah.

Valeri:

The thing is that warehouse operates almost 24 by 7.

Valeri:

So we don't have days when it's not working.

Karol:

Wow.

Karol:

Okay.

Karol:

So if 24 by 7, then I'm guessing the middle of the night is the slowest?

Valeri:

Yeah.

Valeri:

Well, actually, I think actual people stop working somewhat 11 and start to shift at 5 or so.

Valeri:

So you have this gap of five hours in the night, but IT doesn't work in those hours.

Valeri:

So pretty much a no-go.

Valeri:

So we always deploy in the middle of the operations, like when operations are running.

Valeri:

So for us, the system always works.

Karol:

Okay.

Karol:

So how does that happen in terms of the deployment?

Karol:

You get a bunch of microservices in the warehouse, right?

Valeri:

Yeah.

Karol:

You do it section by section.

Karol:

What's the logic there behind that?

Valeri:

Well, ideally, so let's put it this way.

Valeri:

So now we are still at the position where some of the hardware parts are deployed together.

Valeri:

They are part of this monolith unit.

Valeri:

So there you can roll out all together.

Valeri:

It's good.

Valeri:

But again, without the orchestration part that actually relies on some of those spaces, it's also pretty much useless.

Valeri:

So whenever we deploy, we deploy independently.

Valeri:

So each team has their own cycles, but everything is covered by Feature Flags.

Valeri:

So only when all the services are ready and deployed, then you start turning them on and see that actual changes.

Karol:

Oh, through Feature Flags.

Karol:

Okay.

Karol:

So you're basically enabling on the fly, oh, we're ready and now we're going to enable everything.

Karol:

So you just do a switch on in that sense.

Valeri:

Yeah.

Valeri:

But maybe I'm confusing you.

Valeri:

So we never had so far a use case where you need to add an area on the go in the sense that, oh, now we have a new, I don't know.

Valeri:

Oh, well, no, we actually had these robot arms.

Valeri:

That's exactly what we had.

Valeri:

Like when you had to add this new picking system to your already existing landscape to try out how the robot's going to pick your products.

Valeri:

But again, it's pretty isolated.

Valeri:

You just deploy it.

Valeri:

And without this orchestration part that gives you the commands, it's just, okay, it gets the scans or whatever the events from the physical world are coming.

Valeri:

Okay, sure.

Valeri:

I see you, but I have no tasks, so I do nothing.

Valeri:

And only when this orchestration part comes in, then you, oh, now I know what to do with this thing.

Karol:

So the feature flags would be what in the orchestration units?

Valeri:

In every service.

Valeri:

So if you have to create a new flow that spans all the way from orchestration services to this execution hardware parts, you'll obviously need the feature flags on those sides.

Valeri:

But some features that were hardware related, so you'll have them in a specific service.

Karol:

Oof, that is already quite complex in that sense.

Karol:

So in that sense, you don't have to coordinate between specific teams in terms of the actual deployment, but you coordinate only in terms of enabling that deployment to work, that part of the code.

Valeri:

Yes.

Valeri:

However, we also, I'm actually not sure which phase it's currently in, but there is this configuration service, of course, that allows you to basically have a single feature flag for all flows, but then separate services can rely on that.

Valeri:

So you don't need to coordinate the turn on.

Valeri:

You just reach a single flip and you're happy as long as all the deployments are in.

Karol:

Okay.

Karol:

Now we have a very curious person in YouTube on the chat there asking questions, and I relate to that question because we've just been talking about deployment.

Karol:

What about testing the changes?

Karol:

So how do you test that code to work in production?

Karol:

Your production is physical.

Karol:

Is there a physical pre-prod or is there a simulation of it?

Karol:

How do you test this kind of environment?

Karol:

Because that's a unique setup, right?

Karol:

Because normally you just have a dev test environment, SIT, UAT, prod, right?

Karol:

You just move through those stages of testing.

Karol:

How do you test code for a physical environment?

Karol:

Because I believe it would be extremely expensive to do a pre-prod on a physical environment.

Valeri:

Yes.

Valeri:

Yes.

Valeri:

And that's a waste of resources for sure.

Valeri:

So yeah, you basically actually already mentioned how we do this two ways.

Valeri:

There is simulation and there is emulation.

Valeri:

Simulation is basically something we run ourselves.

Valeri:

It's the way we see transport system, basically exempt of the contract with hardware.

Valeri:

It's fast.

Valeri:

So we have tonnes of basically end-to-end tests or integration tests to test all those flows while kind of having the simulated hardware environment under the hood.

Valeri:

But you can imagine the reason for here is that that's something self-written and it can always differ from the actual environment.

Valeri:

So what we also have is very often hardware providers provide not only hardware, but the emulation of the hardware too.

Valeri:

It differs per provider.

Valeri:

The most fascinating one is the one that Harvard called TGW.

Valeri:

We work together for fully automated warehouses, those three-story big buildings.

Valeri:

That's basically 3D environment of boxes moving on rail conveyors.

Valeri:

And that's like a full representation of an actual site in models and responding and behaving the way an actual site would be.

Valeri:

And that's something basically provides part of the contract.

Valeri:

And we do have an access to such environment.

Valeri:

So at any point of time, if our simulation team is behind the schedule and can't deliver you particular pieces of functionality, or you really want to see how fancy it is looking in 3D world, then you can do that.

Valeri:

Actually click around, see crates moving.

Karol:

So you would hook up your microservices, your orchestrators and executors to that emulated environment, which would expose something like a mock-up services.

Karol:

But those mock-up services, instead of executing in a physical environment, they execute in that emulation of an environment, which is like a 3D.

Valeri:

Yeah, I can probably even quickly find you the video of that.

Valeri:

Yes, so it's exactly how you said that.

Valeri:

So remember these layers, I tried to explain that you have this PLC layer, that's basically hardware itself and transport system and us.

Valeri:

So we do have a dev environment, basically our services in dev, then transport system, real transport system, and only this PLC bottom part is emulated.

Karol:

You did prepare for us some diagrams, maybe we should actually visualise that with those diagrams in a moment, after you find that lovely video of the emulator.

Valeri:

Yes, let me find the video first.

Karol:

Because that's, okay, that's really interesting, because this is the kind of systems we don't really get to play with as in regular IT, so to speak.

Karol:

We don't get that manifestation, because of course, testing that is a problem.

Karol:

But also, how good are these emulators then?

Karol:

That's also the quality of those emulators will relate to the results in production, right?

Valeri:

Yes, and I can tell you that it's not an easy experience to work with those.

Valeri:

It's slow, not always reliable.

Valeri:

We actually once cherished an ambition to have performance full-scale testing in this environment, really have it on and running.

Valeri:

Well, that's how I got the videos, because I was part of the squad that's been developing that.

Valeri:

But we came to realisation that it's not something actually reliable enough to run the tests.

Karol:

Yeah, I think, well, to run in regular environments, to run performance tests or load tests, you have to have a replica of the production environment with emulated traffic of a production environment, right?

Karol:

So, it's already difficult to translate from a physical environment to a fully digital environment.

Karol:

That's a layer of complexity there.

Karol:

And then to emulate all the traffic, you'd have to have a very decent snapshot of what the daily traffic in the warehouse is.

Valeri:

Yes, but that's what we've been doing.

Valeri:

We had a set of scripts that would create the full scale, all those thousands of orders and those crates moving around.

Valeri:

Sorry, it's like there are synonyms for those things, and I go back and forth between them.

Karol:

I mean, I've dealt with automated test data generation before, but this is like a whole different level of automation to emulate such traffic in general.

Karol:

It's like, wow.

Karol:

You know, before hopping on the stream, I didn't even consider asking about testing this kind of scenario and what deployment looks like, but this is like, at the same time, it's very analogue to traditional IT, meaning all digital on servers, but at the same time, so different and exciting in a way to hear about that this is, there are the same principles, but executed completely differently.

Valeri:

Yeah, but it's legitimately a very good question, because we as engineers, we get used to delivering something that's actually been tested and working, and here you have an absolutely new problem.

Valeri:

How can you actually say that?

Valeri:

And the answer is that you never 100% sure.

Valeri:

So you take risks, and it's not only that, it's also, it's an integration problem.

Valeri:

So if you've been told that the contract is X, but it's actually Y, and you're like, boom, and that's not something you can easily anticipate or deal with unless you roll out.

Karol:

Yeah, that's the usual problem.

Karol:

Also, in regular IT systems, we agree to have a contract in a specific way, and then we realise that the contract is nowhere near such a thing.

Karol:

And it is at times very challenging, depending on the maturity of the organisation, of course.

Karol:

But here you have a different level of complexity, because you deal with machines, and they are contracts that you do not really negotiate in a sense, because they're just supplied to you by the producer.

Karol:

I don't know, is there a level of customizability to those contracts, or they're just, they just drop you a piece of hardware with a piece of software, and the definition of a contract?

Valeri:

No, well, in that sense, we've been working, and the hardware providers we've been working with are very flexible enough.

Valeri:

So the contracts are defined up front in the collaboration.

Valeri:

So the very first contract, we drafted it together with the TGW.

Valeri:

And then for the new hardware providers, we actually try to reuse the contracts as much as possible.

Valeri:

So it's also easier for us, one less layer of mapping, translation.

Valeri:

And so far, it's been working well.

Karol:

Okay, so the hardware providers are quite flexible in customising the solutions towards your needs, then?

Valeri:

Yeah.

Karol:

Okay.

Karol:

That lowers the complexity a bit.

Valeri:

Yeah, you found the video, finally.

Valeri:

It took me a bit of clicking.

Karol:

All right, let me put that on screen, then.

Valeri:

Yeah, it's not very exciting, whatever I found.

Valeri:

But do you see those small red dots?

Valeri:

It's like actual boxes moving in one of the areas.

Valeri:

Here's like a lift.

Valeri:

Oh, it's me.

Valeri:

I see.

Valeri:

That's my intelligence, because that was one of the run tests.

Valeri:

I'll put the pause on here.

Valeri:

So the different colours, that's how we denote a different state of crates, whether it's empty, whether it's supposed to have stock or already picked orders.

Valeri:

Behind this area, I'm not sure if you're seeing the cursor, but it's like tonnes of picking stations, and the crates are moving in there.

Valeri:

And there is an automation on our side that does the picks.

Valeri:

And here, for instance, is a big buffer, a lot of lanes stacked on top of each other, where crates can rest between the areas, so that we have room for breathing, basically, to do the optimisation, and which crate goes where and when for the maximum throughput of the system.

Karol:

Wow.

Karol:

And the tracks just go on and on and on in the background.

Karol:

I see there's several layers, and the spirals basically going up and down.

Valeri:

And it's actually only one-sixth of the whole warehouse, this particular one.

Valeri:

Remember, I was saying that there are multiple transport systems, so the site is split into those areas, so that's only one of them, and we have one model per area.

Valeri:

So if you want to test the whole setup, you need to open 10 of them.

Karol:

So okay, that's, wow, that's a, okay, that's impressive.

Karol:

I rarely get surprised by things at scale, but this is literally overwhelming my brain at this point.

Karol:

It's like, wow, it's something I have not seen in a while.

Karol:

Quite interesting.

Karol:

Wow.

Karol:

That's so different from what I do in my daily work as an integration specialist.

Karol:

It's like, I look at the screen, neural boards, specs in the dock, and then code as an, you know, integration flows.

Karol:

Looking at a simulation of a warehouse, it's like, I don't know where I would have to be to just encounter that in that sense.

Valeri:

Yeah, that's one of the factors why I stayed for all the five years in warehouse systems and didn't even breach the boundaries of the domain at first because it was so many interesting things to indulge in.

Valeri:

So like, okay, I still have a lot of fun.

Valeri:

Then at some point, I got curious in more things, but still, I, it's like, you know, it's like a baby that I've seen from the inception, and I was in its teenage era.

Karol:

Aw.

Karol:

But I do get that.

Karol:

I remember, like, working for a few years in a singular company, and when you see how your work progresses and changes how the company operates and how it influences the business, it's something really, really amazing, and the satisfaction coming from that is, wow.

Karol:

At a certain point, it's choosing either you want new experiences in the variety of experiences, or you want to experience that kind of sensation of, you know, accomplishment, that satisfaction.

Karol:

I don't know which is better, and probably it depends on where you are in life, which do you want to, like, expand your knowledge, or do you want to, like, dive into deep into that specific field?

Karol:

Both can be very quite satisfying, but I absolutely get how much satisfaction you get from actually seeing these things work.

Karol:

It's like, wow, I would love it, specifically.

Karol:

Do you want to show the complexity of such an endeavour in diagrams, because I see you already have them ready?

Karol:

All right, let's put it on screen.

Valeri:

So, okay.

Valeri:

This is the diagram built by us.

Valeri:

It's basically a representation of a physical world

Valeri:

of one of the fully automated warehouses in Germany, and basically, every colourful box

Valeri:

is a system, is a service, this executor service, and this graph actually shows you all the

Valeri:

possible routings between systems, not within them, only between, and remember this thing I've

Valeri:

been trying to explain on fingers that you are in this blue area, which is the whole set of picking

Valeri:

stations, and you finished your picking, so you basically want to dispatch the code and deliver it

Valeri:

all the way here.

Valeri:

You can't send a single command saying just go straight, even though there are a multitude of paths leading here.

Valeri:

You have to deliver to the exit of this blue system, then make it routed within the red one, and etcétera, etcétera, and cross all those fancy colourful blocks in order to get here.

Valeri:

So, and that's, yeah, I think it's like 80-something in this particular site, all those execution systems, but then, of course, you add on top the orchestration part, and you get a bit more, and that's the scale and complexity of such system.

Valeri:

I think I also have, yeah, that's, for instance, a station, just to show some cool conveyors, so it also becomes more concrete and physical, and here, on the left, you will typically see one crate, on the right, the other, and a person standing in front of that, and putting toilet paper from one to another, for instance.

Karol:

So that's the not exactly automated warehouse, in that sense, that's with manual labour involved, right?

Valeri:

Okay, so how we do distinguish it at this point?

Valeri:

In every warehouse we have, there is manual labour.

Valeri:

The question is to what degree, and what we call automated means it's like real minimum of what was possible at the point we've been building, so now, of course, with the robotic coming into play, you minimise that even further, but there are always humans who at least will need to work the operation of the system, if not doing the stuff themselves manually, but there are always humans.

Karol:

Yeah, of course, you always also need maintenance, you always will need somebody to get the crate unstuck, etc.

Karol:

We're not replaceable by automation, unless we start building bots that are actually doing that job, but I don't think that's feasible in the near future, to be able to build such mechanisms to do that kind of labour.

Valeri:

Yeah, another example, this is the, I think, two or three storey high storage for pallets.

Valeri:

You've probably seen plenty if you go to a big store like Ikea, or whatever they have, like those pallets, crammed with goods, and that's basically a huge lift that can pick up those pallets and store them automatically, so that you don't need to do that with those small cars.

Valeri:

Fully automated in that sense.

Karol:

Yeah, but it's a little bit different than Ikea in that sense, because in Ikea, while you have those very high storage units, the clients actually access only the lowest parts, the other parts are not accessible, and you need a forklift for that to get into there and just move the merchandise down to the level that is accessible, right?

Karol:

While here, the whole thing is working constantly, 24-7, right?

Valeri:

Yeah, exactly, and you're also optimising the background, so of course you put the most urgent stuff to the bottom, where it's easier and faster to retrieve.

Valeri:

It's actually also, this one maybe not, the other storages also have depth, so you can have things, one at the back and one in front, so if you want to access something from the back, you have to pick up the front, move it somewhere, and only then take the back.

Valeri:

This particular one, I don't remember, but that's one of the services we do have that optimises for that storage, choosing what to store where, reshuffling, so that in the fastest possible moment, you get whatever you need.

Karol:

So basically, you need to really coordinate and put things and be aware that there are things in those shelves, so always track the state of the actual storage.

Valeri:

Yeah, it's also, things can fail.

Valeri:

You ask this fancy lift, please store my pallet, crate, whatever, at this position, and it's like, yeah, yeah, sure, but then suddenly, oh, your crate appeared at a different place, and you're like, what?

Valeri:

Why?

Valeri:

But you have to deal with that, or it's like, no, I can't do anything, so you're okay, please retrieve it back and start all over, or whatever the mechanisms are.

Karol:

And how often does these kind of errors happen that the crate is misplaced, or you need to like, start again?

Valeri:

In this particular case, not often, but you can take this piece, copy-paste it like 20 times, and you will get the back of the building.

Valeri:

It's the storage with all those crates, like 50k per a single piece, so it would be like 150k crates or positions per warehouse, somewhat, give or take, so it actually happens quite often, so thousands of times per day, so the whole storage area is built around these unhappy flows, so that you can deal with them.

Karol:

Okay, so just from that perspective, I understand why you do orchestration instead of choreography, because if you're dealing with so many unhappy flows that you need to compensate for, it's always better to have that state in an orchestrator that manages all states of different executors, rather than have it choreographed, because choreography is a lot more complex in error handling.

Valeri:

It's not actually like that, so the logic of all the errors related to storage is like 95% isolated in the storage service, or whatever it's called.

Valeri:

It's pretty down to physical, but of course, you still have a bit of a trickle back to the orchestrator if something you are trying to store has a high impact on the business flow.

Valeri:

For instance, you want to store a crate for an order of some customer that is really urgent, that needs to be sent out of the warehouse in the next 30 minutes, or actually not store, retrieve, and if something goes wrong, orchestrator needs to know, so that it can take an executive decision of what do we do for this particular customer, right?

Valeri:

In that sense, there is a bit of a trickling, but orchestrator is largely unaware of the hardware world.

Valeri:

Actually, not largely, it's completely unaware of the hardware world.

Karol:

Okay, so majority of the error handling is actually in the executor for that specific area, or piece of an area.

Valeri:

That actually depends on what kind of errors are we talking about, because there are a multitude of errors.

Valeri:

There are those that are hardware-driven, and some of them can be solved locally, then you're in a good state.

Valeri:

Some of them have to trickle all the way up, so that you get this high-level command, what the hell do I do with that?

Karol:

Okay.

Valeri:

And some of them are not hardware-related, but process-related.

Valeri:

Imagine, you're standing in front of a station, and the screen shows you, please pick five bananas to the crate on the right, but you're like, but there are only three.

Valeri:

Oh.

Valeri:

Yeah, and that is something you have to deal with, because now you apparently have less stock than you assume there is, or you're looking inside the crate, oh, the milk spilled, and it's not safe to send back on the conveyor, so you go into the rework flow, how do I fix that?

Valeri:

So, for that, you have also the workstations.

Valeri:

Everything that can be fixed there goes there, apart from the spilled milk, because that's not safe.

Valeri:

And these kind of unhappy flows, they go through the whole system, because executors, executive systems are involved, orchestrators are involved, because that's part of the business flow.

Karol:

Okay.

Karol:

So, from a perspective of events, and the workflow in such an environment, how do you identify these kind of errors, like spilled milk, or insufficient bananas?

Valeri:

Two types.

Valeri:

Some of them are done by scanners, or actually specific pieces of equipment.

Valeri:

You can do the height check to know how high the crate is, because if it's too high, it's not safe to send, because it will start bumping into things.

Valeri:

That's automatically done, creates a physical check.

Valeri:

Same goes to weight.

Valeri:

You have a certain limitation of how heavy the stuff can be, so you can simply weigh it down and decide whether it's safe or not.

Valeri:

But types of like, oh, I have less stock than I thought, or it's something that is normally reported by humans at the stations who do the picking.

Valeri:

However, the new thinking is also like, we do trial the visions of cameras that look into the crate and do the counting of stock, checking, oh, you actually have all those five bananas, so whatever is countable.

Karol:

Okay.

Karol:

Can you move to the next picture, just to show our viewers the scale of the endeavour here?

Karol:

Yes.

Valeri:

That's actually a very small part of conveyors.

Valeri:

That's how they are stacked, and that's only one floor of the building.

Valeri:

Oh, I don't remember the specific one, but I think there are more.

Valeri:

At the bottom, there are stations, actually, where actual humans are, and then the top is just more layers of the belts where the crates travel to different parts of the system.

Valeri:

So, that's where I've spent an insane amount of hours testing the system.

Karol:

So, basically, the whole warehouse in terms of conveyors are divided into layers that actually have different functions in that sense?

Valeri:

Yeah.

Valeri:

Well, it depends, again, on the site.

Valeri:

Our very first one was a bit of a, you know, first babies are always the most interesting ones, and there we had a ready building already, like all the walls and the roof, so we tried to squeeze the equipment in.

Valeri:

It was hard.

Valeri:

There, you had to put some imagination, not us, but the hardware providers to squeeze all those conveyors so that everything that has been planned fits.

Valeri:

For this particular site, we took the different approach.

Valeri:

Basically, building is built for the project, not the project's questions as existing buildings.

Karol:

So, you first design the actual conveyors and whatnot, and then build the building around it, at least in the design, and then build the building and place the conveyors in as per specification.

Valeri:

Yeah.

Valeri:

So, you have a space, so now you don't need to solve this puzzle.

Valeri:

How do I plan this piece so that it still fits?

Valeri:

Because, actually, there is a column in there, and I didn't think about that column.

Karol:

This reminds me of me playing Satisfactory.

Karol:

It's basically a game about conveyor belts and production elements and moving all the pieces around.

Karol:

It's kind of the same way.

Karol:

You first build the building and then start placing conveyors inside.

Karol:

It's like, huh, this doesn't fit.

Karol:

It's pretty much it.

Karol:

If you ever want to emulate this for your own, just play Satisfactory.

Valeri:

I think that's already been done for one of the sites.

Valeri:

Engineers couldn't resist this possibility, so I believe it already exists.

Karol:

Oh, wow.

Karol:

That must be quite a puzzle to solve.

Karol:

It's basically a physical puzzle, how to place all of this together.

Karol:

If you would try to say, what is the amount of people that are needed to work on this to solve this in terms of lining this up in physical space as in conveyors, then the electrical, then all the placement of paths so that people can move around safely?

Karol:

How many people work on the design of such a thing?

Valeri:

I actually don't know.

Valeri:

I have no idea.

Valeri:

I can't imagine how many people, physical business crew, their driver go around and collect thousands of them because the site is huge.

Valeri:

But in terms of the design, you don't need that many, actually.

Karol:

Quite possibly.

Karol:

It's like a big puzzle box in that sense, just to weave that all in, plant that in, and those layers of different aspects.

Karol:

I remember from studies in terms of electrical engineering, we got blueprints of a building and we were supposed to place an electrical design into it.

Karol:

And then you layer on top of that all the waterworks.

Karol:

And it's layers of layers of layers that need to cooperate with one another because you cannot place the electrical in the same spot as the water going, right?

Karol:

It's a big puzzle box of collaboration between different experts.

Karol:

Here it is mostly mechanical and electrical, but still, it's...

Valeri:

That reminds me of one of my favourite stories.

Valeri:

I didn't have to deal a lot with the people who assemble the sites because normally when the engineers go in there, it's already mostly assembled and we already start testing.

Valeri:

But then I just moved to the Netherlands after COVID exactly for our first warehouse for basically sightseeing.

Valeri:

That's where I'm going to be working on.

Valeri:

And so I'm walking around with these big starry eyes.

Valeri:

I'm like, wow, it's so cool.

Valeri:

And on the floor, there are two guys sitting.

Valeri:

You can imagine the type of man who did construction, tough build, quite crazy age, with the screwdrivers.

Valeri:

And it happened so that they'd been speaking Russian.

Valeri:

And of course, they didn't know that I do speak Russian.

Valeri:

And then they look at us and like, what are all those kids are doing in here?

Valeri:

And I'm thinking, oh my God, these kids are going to build a system around that.

Karol:

That's what kids do nowadays, build systems, right?

Karol:

Wow.

Karol:

That's something.

Karol:

It's a completely different scale.

Karol:

Yeah.

Karol:

Looking at a bit of a skeleton of our conversation, you want to walk us through the actual implementation of what's the logic behind implementing like more in details.

Karol:

We already touched upon the orchestrators and executors I think it's worth to give it the proper visual overview and just do a quick walkthrough because that's probably a bit of a unique setup in terms of VDA and the cooperation of different classes of software.

Valeri:

Yeah.

Valeri:

Well, I definitely can accompany those few diagrams I prepared, but that's going to be mostly a rehearsal of what's been already mentioned.

Valeri:

All right.

Valeri:

So that's what I've been saying.

Valeri:

So there's two bottom layers is something you already have provided for us and we build this kind of top pile of the pyramid.

Valeri:

And then this top pile is split naturally in those executors, like beginning and dispatching something you hopefully already heard.

Valeri:

There's basically processes that's very tied to the machinery and this fancy colours actually already show that we have at least three different types of these systems depending on hardware in implementation and God knows what.

Valeri:

And here the events going back and forth is basically transport system tells us that this can happen.

Valeri:

And we most often say, hey, please transport this particular crate somewhere.

Valeri:

This type of communication we expect here.

Valeri:

But of course, there are many more errors exactly for all the unhappy flows that might happen.

Karol:

Yeah.

Karol:

So these are basically machine events, events coming from specific machines like scanners.

Valeri:

Exactly.

Valeri:

Scanners, one of them.

Valeri:

Then you also have those scale, weigh scales and height checks, the whole new separate world of this big storage we've been looking at because it has very intricate contract because there are like aisles, levels, positions, and it's not that just transport there.

Valeri:

You have it in multiple stages.

Valeri:

So a lot of things can happen in between.

Valeri:

It's all happens at this layer between the transport system and our executor systems.

Valeri:

And then on top of that, we do have orchestrators that don't know anything about physical worlds in terms of scans, crates, etcétera.

Valeri:

It's of no importance to them.

Valeri:

That's the way we abstract things away, but they deal with the higher level terms of basically processes, how to route things, how to do the full order from the picking to dispatching.

Valeri:

They send the commands that executors would take and do something about, and then report the status, hopefully completed in most cases, but of course, whatever, nothing happens.

Karol:

So answering the one question that was in chat earlier in terms of Lego blocks and abstraction and plug and play.

Karol:

So looking at this, it means that executors know about the physical world and their adapters towards the machines that are actually in the physical world in the warehouse.

Karol:

And then orchestrators are more generic, completely abstracted away from the physical world and they plug and play with the executors.

Karol:

Well, plug and play probably with small adaptations, smaller than the executors then.

Karol:

Yeah, indeed.

Karol:

Okay.

Karol:

And so the orchestrators, they basically orchestrate, oversee the actions of multiple executors.

Karol:

So you have what?

Karol:

For a zone, you would have multiple executors and an orchestrator?

Karol:

What's the dispersal there?

Valeri:

You always have multiple orchestrators because that's basically your business flow.

Valeri:

So bounded context isolated somewhat.

Valeri:

So what we have here is the routing one has a sole purpose of solving the problem, how to deliver those across those big physical areas.

Valeri:

So not from A to B in the sense of on the conveyor belt, but from A to B in terms of the executors, how do I go from picking to dispatching all the way?

Valeri:

And it will solve all the problems like how many hops do I need to make?

Valeri:

What is the most efficient path?

Valeri:

And efficient might mean that they do the few hops or that actually part of the warehouse is combusted or so loaded that I would rather go around because it would be a more efficient way to do that.

Valeri:

But basically controlling traffic while releasing here is solely responsible for the journey over a customer order through the warehouse because the goal of the whole warehouse is basically assemble the order that has been ordered, right?

Valeri:

How can I get this order fully picked if needed, cooled and dispatched?

Valeri:

And in between, if something goes wrong, how do I get to this dispatching state through the rework processes we have?

Valeri:

So normally you would expect that most of the orchestrators are present for all the sites while executors depending on the hardware or whether it's actually manual or automated warehouse might be present just as upset for some form or not present at all.

Karol:

So if you have a specific level of automation that would involve then that we have certain signals about packing inputted by a human or in the near future inputted by the machine, that the machine completed packing and the conveyor can move on with the box.

Karol:

And so these would be like types of events that also happen, right?

Karol:

The person packing marks that this has been packed through whatever means of a machinery, like a scanner or something like that.

Karol:

And then the equipment.

Valeri:

Yeah, that's all the events.

Valeri:

So basically the interfaces, we try to keep the interfaces the same.

Valeri:

So be it a human or a robot, they will eventually call an endpoint in the picking system saying that what has been done and then the picking system will already propagate further.

Valeri:

But in terms of a higher level, like for this order, that's the result I've got in accordance to what was the command and then do whatever now is needed with this because I'm kind of done with that.

Karol:

So you're basically also managing the state of the box in that sense.

Karol:

Okay.

Karol:

I'm trying to grasp the amount of complexity that goes into just like a single box and it's quite a lot there already.

Valeri:

Yeah.

Valeri:

And now imagine that you've started with a single monolith and it's been already there.

Valeri:

So we don't know this variability of different hardware providers and sites yet, but already on that stage, we figured that, okay, it's not going to be simple.

Valeri:

It's not going to be beautiful straight away.

Valeri:

So let's not shoot ourselves and go in microservices straight.

Valeri:

Let's first learn about the main as much as we can, but we already try to decouple as much as possible.

Valeri:

So that even on the monolith level, dispatching system never knew anything about picking system because in the real world, they don't need to know about each other.

Karol:

Yeah.

Karol:

They don't have to be aware.

Karol:

They just have transition elements that go from one to another.

Karol:

Right.

Karol:

So how does that transition happen?

Karol:

You do have a few diagrams in that, right?

Valeri:

The laptop of the diagrams actually show the routing problem, but that's, we already kind of covered multiple times.

Valeri:

So I'm not sure.

Karol:

Yeah.

Karol:

Let's just see them for the sake of visuals because they interact with people, right?

Valeri:

So what the storyline I was preparing while drawing those diagrams.

Valeri:

So all the familiar kind of names, hopefully by now you have picking, dispatching, and you want to go all the way from picking to dispatching, but in between you have other systems.

Valeri:

I call them buffers.

Valeri:

It might be conveyor belts, straight, can be looped, can be other systems, God knows what.

Valeri:

So the dark dots are the exits, white dots are the entries.

Valeri:

So at this particular diagram, it's kind of looks simple, right?

Valeri:

You just do those three hops and you are happy.

Valeri:

However, then you figure out that actually one system can have multiple exits and you have multiple routes.

Valeri:

So now you need to decide how each route to take.

Valeri:

And what if this particular route is actually super busy because there is a traffic jam there.

Valeri:

And that's how you ended up with this routing service, one of our interesting babies.

Valeri:

And here, what's interesting in terms of communication is, so it gets the command that the dispatching system wants the crate.

Valeri:

But before it can send the next command to picking saying, hey, actually give away the crate, it has to figure out which route to take.

Valeri:

And for that, it has to figure out whether the systems in between are available and what state they are in.

Valeri:

But not in the sense like, oh, give me all your state.

Valeri:

It's rather, hey, I have a request for you.

Valeri:

It's like this number two to all of them.

Valeri:

And then optimistically, you expect a response saying, yes, I've got you, everything is fine.

Valeri:

But then in case one of the responses doesn't come, it might be lost, system is not available, then you deal with this ambiguity.

Valeri:

Okay, apparently the system can be considered unavailable.

Valeri:

So I deal with whatever I have, this is the only one available.

Valeri:

So I will build my route around that.

Valeri:

And this kind of logic goes into this bucket of unhappy flows.

Valeri:

So this is completely event driven communication in the sense that there are no rest calls whatsoever.

Valeri:

And the system, the challenge here that for the systems to build the handling in such a way that you don't assume too much, you work just exactly with the amount of information you have.

Karol:

Yeah, that's actual real, real time instead of like, near real time or mimicking the real time reactions.

Karol:

This is actually all happens in seconds or less.

Valeri:

Yeah.

Valeri:

And we do have a requirement that remember those forks, if you, the scanner before a fork, and you want to take a decision left or right, you actually have like 200 milliseconds to do so.

Valeri:

Wow.

Valeri:

In most cases, it should be already built to solve this problem too.

Karol:

So very low latency communication.

Karol:

Yeah.

Karol:

This wouldn't make sense to proxy that through anything else than just a messaging service in that sense to just keep that latency very low.

Karol:

In that case, that requires that async component.

Valeri:

Yeah.

Valeri:

And you also, in most of the cases, you shift the decision a bit earlier to give you a space for thinking so that like, you not always need to make a decision to take a decision exactly at the scan.

Valeri:

You can take it in advance in terms of routing so that by the time the crate reaches the scanner, you're ready.

Valeri:

Oh, actually I had the command that whenever crate appears there, I can take right turn.

Valeri:

That's also another way to satisfy this latency requirement.

Valeri:

Yeah.

Karol:

I'm thoroughly impressed in this sense.

Karol:

There's a lot of logic to take care of here to be able to make it work.

Karol:

And just looking at this from a perspective of just passing through the crate passing through, it's like, yeah, if you put it as a sequential pass through, this would not be efficient at all because any small thing that happens on the conveyor belt just derails the whole process for a long time so that this actually requires multiple routes so we can have workarounds between the zones.

Karol:

So that's, oh, the complexity there.

Karol:

It would take a while for a person to learn with the complexities of a particular warehouse to just understand the problem domain there.

Valeri:

Yeah.

Valeri:

And that's what we're actually seeing for the newcomers, how difficult it is to onboard.

Valeri:

I'm obviously biassed.

Valeri:

I've been there for five years and seen it from the beginning.

Valeri:

And even though I left warehouse systems in terms of I'm not working directly every day on the system, it's still insane to think that my knowledge is very valid still.

Valeri:

Yeah, because, and like for the people who join, my heart truly aches for them because it's hard.

Karol:

Yeah, we've been talking on this stream about cognitive load some time ago, two streams, actually.

Karol:

It's a lot of extraneous cognitive load there just to learn all of this and just to cope with that.

Karol:

The complexity is large.

Karol:

If you have 20 kilometres of a conveyor belt in one warehouse and all the combinations of routes that can go there and all the actions that can happen within such a warehouse, that's...

Karol:

Technology-wise now it's just switching a little bit because we have an interesting comment again on YouTube.

Karol:

Why not Kafka?

Karol:

I think we're dealing here with a little bit of the Kafka cold followers, as unfortunate as it is, but I'm going to shut up about my opinion about that.

Karol:

Let's have your voice in that.

Valeri:

I'll tell you that, honestly, there is no reason.

Valeri:

Both Rabbit and Kafka would work.

Valeri:

We don't do anything so super special about the events that just a decent message broker wouldn't be able to deal with.

Valeri:

In the end, it's just messages going back and forth, and as long as they are durable, we are good.

Valeri:

Why Rabbit?

Valeri:

Historical reasons.

Valeri:

When it has been started, there was no Kafka.

Valeri:

There were no experts on Kafka either.

Valeri:

It was a larger part.

Valeri:

Rabbit worked for us.

Valeri:

Historically, we stayed there.

Karol:

I would probably figure a few more reasons why not Kafka.

Karol:

Just the sheer complexity of running Kafka as a cluster.

Karol:

I think in terms of operability and maintainability, running Rabbit and Q would be a lot simpler than running a Kafka cluster.

Valeri:

Yeah, but again, don't self-host.

Valeri:

In that sense, it doesn't really matter.

Valeri:

I would say even if you're not self-hosting, and you're simply relying on the cloud hosting somewhere or actually already existing platform, in that sense, Kafka is easier.

Valeri:

You just use Confluent, everything works.

Valeri:

Whereas for Rabbit, you don't have that many hosting platforms that already give you all the necessary tools so that the maintenance burdens is actually minimal.

Valeri:

That's what we struggle with now.

Valeri:

It's fine, because if you check the big systems, big companies, players out there that use Rabbit for their messaging, they do self-host or do the hybrid of self-hosting and cloud hosting.

Valeri:

Self-hosting is not something we do have expertise for or resources for, but it's not.

Valeri:

For now, we are on the cloud MQP.

Valeri:

It's been working so far, but we do question that it's going to still work with the planned expansions and the scale that's growing and evolving.

Karol:

I would just comment that basically learning to work with Kafka, because hosting and maintaining Kafka is one story, just looking at the complexity.

Karol:

If you have it outsourced as a SaaS solution or a PaaS solution, then you don't deal with the infrastructure part and having it running.

Karol:

But you still need to build the logic that is on Kafka with topics and how to work with the partitions.

Karol:

I think running on AMQP 0.9 or 0.10, I don't know version of protocol you use.

Karol:

0.9. 0.9. Running AMQP 0.9 in terms of the logic of setting up routing in Rabbit MQ and these kinds of things is a lot simpler than running Kafka, especially from the perspective of filtration and FIFO.

Valeri:

That's the motto of Rabbit, right?

Valeri:

Smart server, stupid client, so that your applications don't need to sync much.

Valeri:

But in practise, if you take any technology and if you want to use it to its best, you need to understand how it works.

Karol:

Of course.

Valeri:

That's exactly why we are running Rabbit MQ teaching course in the company, because it's so essential to operations.

Valeri:

Actually, even with the simple way of using it, kind of easy user experience, you still can mess up quite nicely.

Karol:

Like with any tool.

Valeri:

Exactly.

Valeri:

In that sense, yes, you might need more investment at the beginning if you go for Kafka.

Valeri:

But I don't see it honestly as the biggest challenge or necessary.

Valeri:

So for us, it's more historical.

Karol:

Okay.

Karol:

The usual quote I get from my colleagues, a fool with a tool is still a fool.

Karol:

So yeah, if you don't know what you're doing and you're using a tool, you're still not going to do well in that sense.

Karol:

And that's the basis for it.

Karol:

But I think just from the perspective of technology, the learning curve to get going with Rabbit is definitely lower than the learning curve of having Kafka in the ecosystem.

Karol:

So it's probably simpler to work it.

Karol:

And given the complexities you're dealing in the problem domain, I would say, yeah, use something that is simple and faster to learn so that you can be operational quicker and put your focus on the complexity of the problem domain instead of the technical part of the implementation.

Karol:

Yeah, that's very true.

Karol:

So answering the question from YouTube, how Rabbit would help solve the communication problem, just by simplicity, we refocus here to the problem domain itself instead of learning the tech.

Karol:

That already helps, I think.

Karol:

All right.

Karol:

Now, the juicy part, we have the complexity part, the tech part, the problem domain, but I love the juicy part of how does the organisation play into it?

Karol:

Because I'm guessing getting to the point you are here over the five years, catering to those babies of microservices, the monolith first, it was a technical journey, of course, architectural journey, because you started splitting up the monolith and building microservices, and you're still upgrading.

Karol:

How does the organisation play into this?

Karol:

What were the organisational challenges?

Valeri:

Well, obviously, our organisation changed with the way architecture progressed.

Valeri:

It's kind of two dependent things.

Valeri:

And of course, we started as a single big team, relatively big, because we needed to deliver this monolith in a period of time.

Valeri:

Then you already know that it's too big of a team.

Valeri:

Now we go microservice landscape.

Valeri:

So teams start appearing.

Valeri:

By now, I think it's ten teams with their isolated set of components.

Valeri:

And the challenge by now is, of course, how you make all those teams collaborate and communicate so that they keep this architecture running.

Valeri:

For a lot of people, it's a mind shift, right?

Valeri:

People coming from the background of like rest calls and somewhat synchronous systems.

Valeri:

And here, it's like 99% asynchronous, but you still build with this back synchronous state of mind.

Valeri:

Yeah, so the challenge here is that how do you teach people that if you want to do something in an event, expose it or consume it, that you have to communicate to your neighbours.

Valeri:

So you don't work in silence.

Valeri:

But in that sense, I think it's not new to any microservices setup.

Valeri:

You always need a high level of communication in there.

Valeri:

But the state of mind that now when you consume an event, and you write the handling around that, it's not that you're like, you basically have to keep this bigger picture in your mind, in a sense.

Valeri:

So what's the state I can be in?

Valeri:

And what can go wrong, etc.

Valeri:

So that's where the challenge comes in.

Valeri:

And by now, it's also like we used to have basically two people as an architect in this particular domain.

Valeri:

But by now, you realise that the system is too complex, it's too big, even for two minds.

Valeri:

So you want somewhat an architecture group being there full time to actually help teams communicate, collaborate, and make sure that this architecture doesn't go spaghetti way.

Karol:

And that's the usual problem.

Karol:

Just from experience of our large integration platforms, and nowadays, integration platforms themselves are just microservice systems, in combination with EDA, right?

Karol:

They're both sync and async protocols.

Karol:

Not that large of a async component into that.

Karol:

So it never reaches like 90%.

Karol:

It's usually the async and regular run of the mill integration platforms, maybe 20-30% at that kind of level.

Karol:

But if you get an ecosystem where you have 200 distinct microservices over an integration platform itself, that is usually governed by multiple architects, integration architects covering that.

Karol:

So this is very analogous in that sense, where you basically do the same with such a big thing, because one person just cannot comprehend all of that, they need to cooperate and exchange that information and work on certain areas.

Karol:

But from what you're telling me in terms of an evolution of the organisational thing, you actually didn't hit the Conway's law that hard.

Karol:

You actually went straight into the inverse Conway manoeuvre from the perspective of how you can handle Conway's law.

Karol:

So you basically started changing the communication of the communication patterns between teams, alongside how the architecture grew, which is awesome, because you avoided the most painful problem of architecture, which is not adjusting the organisation to the architecture that's growing and evolving.

Karol:

And it seems like you just went like, oh, that's how it should be done.

Karol:

It seems like you said it so effortlessly.

Karol:

It was like a natural thing for everybody.

Valeri:

Well, it felt natural, because actually, the organisation structure really mirrored the amount of, almost a mirrored amount of services.

Valeri:

So there was one service, one team, manual automated.

Valeri:

Then you start spawning those microservices.

Valeri:

Oh, wow, you need the team for that.

Valeri:

And it's like, rent-multiplying, so very naturally distributing.

Karol:

I must say that's quite rare.

Karol:

You do realise that, or not really?

Valeri:

No, not really.

Karol:

Gotcha.

Karol:

So just to put it in perspective, because I've been with multiple clients all over Europe, this kind of behaviour to naturally grow and change your organisation with the evolution of an architecture, that's not something that's common.

Karol:

Usually, the organisation is driven by the business, the way teams are organised.

Karol:

So it usually doesn't match the architecture itself.

Karol:

So one team can be responsible for multiple systems, and then you have a problem or two teams are responsible for one system, and you have a different problem in that sense, because they collide with each other.

Karol:

And you either create bottlenecks or friction.

Karol:

Yeah.

Karol:

While here, it seems like you completely bypassed the problem of Conway's Law of ignoring it or accepting it, and you just went like, we're going to just grow as it is.

Karol:

As we grow with architecture, we're going to grow our organisations.

Valeri:

Well, it's not that black and white, right?

Valeri:

Okay.

Valeri:

Remember, we have this monolith part that's not fully gone yet.

Valeri:

So you naturally would have, I think, two teams working there, and they kind of share the space.

Valeri:

But I believe both teams have the people from the very beginning who have been building the system.

Valeri:

So they nicely collaborate at that point, but you still have this natural friction when they need to work on the same code base.

Valeri:

Okay.

Valeri:

And at the same time, if you naturally grow your teams as your architecture evolves, there is also a trade-off in there, because you add new people rapidly, and you need to educate them, so you will have this gap in the delivering of features while this whole new organisational setup stabilises.

Valeri:

And I basically saw two or three reorganisations in these five years in this domain only, and you see those dips when you do the reorganisation.

Valeri:

So at least you want to minimise those, right?

Karol:

Yeah.

Karol:

It's hard to nitpick on these things at this point.

Karol:

It's like, how?

Karol:

But it feels like you did it naturally, even with those reorganisations.

Karol:

It feels like you say it's like, it just happened.

Karol:

I mean, either you have somebody that's really mature about looking at things, about reorganising how you work, and to spot inefficiencies in ways of working, or the teams actually have a knack for that.

Karol:

What's the deal?

Karol:

Do you have somebody looking at those?

Valeri:

Well, the organisation itself is flexible, and business is happy to realise when they need to change, and we need to change in order to meet the needs of the business.

Valeri:

So in that sense, people are very sensible.

Valeri:

We definitely do have lots of smart people there, too.

Valeri:

That helps.

Valeri:

But also, if you look at the architecture itself, it wasn't that we are tech popping up and saying, oh, now from the monolith, we go to microservices and build all this event-driven thing.

Valeri:

No, you have to build a narrative.

Valeri:

You have to explain why it's important.

Valeri:

Repeat this narrative half a year, at least, and still repeating and showing that it works.

Valeri:

So it's there.

Valeri:

It's there.

Valeri:

But it's not as painful, maybe, as you would normally see it, as what we are referring to in other organisations.

Valeri:

That's why I don't have bitter feelings about it.

Valeri:

So for me, it went well.

Valeri:

Okay.

Karol:

So it still requires work and reiterating and communicating that architecture to stakeholders and other teams that we need a shift because we're hitting these kind of problems, and those problems will manifest harder later on, and we want to avoid that.

Karol:

Yes.

Karol:

But it's still in the area of avoiding rather than we're hitting a wall.

Karol:

Or did you hit a wall somewhere?

Valeri:

Always just in time.

Karol:

Okay.

Karol:

I mean, as well as if it's just in time, that's perfect, right?

Karol:

But another curious question popping in.

Karol:

So if we have those teams and the composition of those teams, how many of those people are there from the beginning versus the fresh blot?

Karol:

What kind of a rotation in staff do you see over those teams?

Karol:

Or how many more people you have to actually onboard into the whole thing to make it still viable and work?

Valeri:

Yeah.

Valeri:

Well, let's see.

Valeri:

In the pure numbers, we've started as a team of like 15 or so, but that included POs, engineers, leads, etcétera.

Valeri:

Two-thirds of those already left or not working on the system.

Valeri:

So there are at least five people who have been there from the very beginning.

Valeri:

And now we have 10 teams, each at least eight people.

Valeri:

So 75 new.

Valeri:

But of course, the inflow was gradual at various points when the organisation had to change.

Valeri:

Okay, now we are not a single team, now we are three.

Valeri:

So you add more people, etcétera.

Valeri:

So you have different kind of generations.

Valeri:

But I would say at this point, it's like maybe 40% or one-third of people who's been long enough to comprehend the complexity and know what's going on and the rest at various stages of, oh, what the fuck, where I am?

Valeri:

Okay, I'm competitive enough within the boundaries of my team.

Valeri:

But now if I do step left, they're like, oh, what the fuck, then go.

Karol:

Okay, so there's a learning curve between the teams in terms of they focus on the narrow aspect of the whole thing.

Karol:

And there are quite a few of those.

Valeri:

But that was the whole goal, right?

Valeri:

You can't expect all 80 people to know the system in every inch of details and you don't want to.

Valeri:

It's a waste of time.

Karol:

Yeah.

Karol:

Yeah, I mean, I do agree of that premise because that's the regular thing.

Karol:

We want people to specialise and have some people that have a broader view so they can guide others in those specialisations and niches that they're handling so that we still can have that efficiency.

Karol:

But in general, would you say that there is a broad understanding of how every, between different engineers, how the whole thing works and how they, what are the dependencies that they're working with between teams?

Valeri:

Yes and no.

Valeri:

There is definitely this understanding in terms of the closest neighbours, because not, I hope that we never see that all systems are connected and know about each other.

Valeri:

That's not the goal of this architecture and the split.

Valeri:

So you definitely know to a certain degree what happens in the neighbouring teams.

Valeri:

And there are people who know more about higher review too.

Valeri:

But normally people who joined, let's say, half a year, a year ago, they wouldn't grasp the whole dependency picture as of yet.

Karol:

Okay.

Karol:

There is that learning curve there probably for those later joiners in that sense.

Karol:

In terms of challenge, but not from a perspective of a technical, but more of a perspective of psychology, you started as 15 people designing the monolith and designing the first warehouse.

Karol:

How much of a psychological challenge was it from the perspective of pressure, perspective of stress?

Karol:

And again, that's a lovely question from our constant commenter today, how stressful was it to develop the first warehouse?

Karol:

But I want to expand it into like looking into also the stress of growth, because adding people to the mix is always stressful, because that's basically rearranging the groups, rearranging the relationships between people, relationships between people and systems, etcétera.

Karol:

So that's always the ongoing change is a huge stressor for just the most moving part being humans.

Karol:

So can you describe those challenges in terms of stress, psychological safety, relationship building?

Karol:

How would you look at it from a perspective of those five years?

Valeri:

Honestly, in terms of the relationships, I think I haven't seen such tightly built group.

Valeri:

Now it's less so because there are like already an insane amount of people, but those few stages, first three or four years, we've been very close, because it's actually a coping mechanism.

Valeri:

Even if you're in different teams, you're still like tight, like really holding each other tight to support and go through the stresses related to delivering, because like the stress levels you can like pretty much describe as a slowly building and reaching the climax.

Valeri:

It's when you, oh, actually, tomorrow, we have to run the whole new site, and it has to work.

Valeri:

And you're like in a frenzy foam out of your mouth, exactly.

Valeri:

And you know that tonne of things going to go wrong, and how you're going to deal with that.

Valeri:

Then you stabilise the system, it works.

Valeri:

You breathe out for a while, you're kind of in this deep of pleasure and sanity, and then it starts building up again, because a new warehouse is coming.

Valeri:

And you know that you have to deliver features for the old ones, maintain them, migrate meanwhile, and support the new warehouse.

Valeri:

It's like, so what helped us through like this very original team and those few generations further is like really supporting each other.

Valeri:

Without that, I don't think it would have been possible.

Valeri:

Can you hear me?

Valeri:

I see the connection lost.

Valeri:

Sorry.

Karol:

We still hear you.

Karol:

We just lost the video, and it happened a few times over the stream that you got kind of frozen a little bit, but don't worry about that.

Karol:

Oh, for a moment, we did lose audio as well.

Karol:

But you're here.

Karol:

I still see you in the studio, so.

Valeri:

Okay, I think it's getting better.

Valeri:

At least now I can hear you.

Karol:

Okay.

Valeri:

Somewhat sequentially.

Karol:

Technical problems, that happens.

Karol:

Unfortunately, internet can be spotty at times.

Karol:

Yeah.

Karol:

Oh, I'm sorry, folks.

Karol:

I think a brief pause.

Karol:

We lost Valeria here for a moment.

Karol:

Internet problems.

Karol:

So while we are having this small of a post, we were going to the end of the stream anyways.

Karol:

I'll just pop in with some advertisements.

Karol:

So next month, 5th of November, we're having another interesting stream, which is with Brian Labelle from Rubell Business Solutions Limited in Canada.

Karol:

We're going to be talking about migrating iPaaS technologies to something else.

Karol:

So for example, pure Java implementation.

Karol:

We'll see how that goes.

Karol:

I'm going to play the devil's advocate here to see what are the angles and what are the angles that are usually missed by companies looking at iPaaS from the pure cost perspective of licencing.

Karol:

So we're going to be doing that.

Karol:

If you're new to the stream, if you haven't been here before, you can go and read more about the work we're doing in terms of knowledge sharing about enterprise application integration.

Karol:

Just read more on bridging the gap EU comp.

Karol:

You can also subscribe to our sub stack, although it's a little bit tiny bit neglected nowadays because of other priorities.

Karol:

And please do subscribe to the channel to get notifications about other live streams.

Karol:

And I do see that Valeria is back.

Karol:

So let's go back to the live stream.

Karol:

Internet is internet.

Karol:

Sometimes it works.

Karol:

Sometimes it doesn't.

Valeri:

So much for distributed system building, right?

Karol:

Fallacy of distributed computing.

Karol:

Latency is always zero.

Karol:

And the network is always reliable.

Valeri:

Yeah, exactly.

Karol:

That's a nice one.

Karol:

Always.

Valeri:

Yeah.

Valeri:

So where were we at?

Valeri:

Stress.

Karol:

We were talking about the psychological aspects, relationships within the teams, looking at the five years, the starting 15 people, and then looking at how it developed later into the 80 people squad that you have right now for different services.

Valeri:

So yeah, if I do a quick sketch, those first two insanely close.

Valeri:

That's how we survived this original big project.

Valeri:

Then there's even a new few teams coming in and us splitting into smaller batches since most of the people are still from this original batch.

Valeri:

We're still very close.

Valeri:

And the new people coming, it felt like we have new members of the family.

Valeri:

It's always been very tight.

Valeri:

But of course, it was destined to go a bit less tight and less, let's say, family-like at some point.

Valeri:

So what you would observe now is that there is still an insane support between the teams, especially in those teams where you have those old people from the very beginning.

Valeri:

They would help no matter what.

Valeri:

And that brings people together a lot.

Valeri:

And even with reorganisation, when you move people around, it still feels safe, I would say.

Valeri:

And I see people are quite close.

Valeri:

However, you also see a phenomenon that the more new people in the team are, the more isolated it gets in terms of now if something happens, incident, right, now you need to solve the problem.

Valeri:

And you see this ping pong, oh, is it my team problem?

Valeri:

Maybe I can redirect it to my neighbour.

Valeri:

So there is a bit of this, let's say, deep, where you see a ping pong during the incidents, like, oh, let me ping the other team.

Valeri:

But I feel like it's natural because, again, the system is so big that it's quite frightening for a lot of people, especially new to that they need to embrace that complexity.

Valeri:

And I, for instance, remember myself when I just joined the company.

Valeri:

I also was part of the incident crew for the systems.

Valeri:

I didn't know.

Valeri:

I don't know what's going on.

Valeri:

Whose problem can I make it, please?

Valeri:

Just to offload it from my shoulders.

Valeri:

So I completely get where this is coming from.

Valeri:

How it's dealt with is the teaching sessions on how to run the incidents, helping people to grasp this bigger picture of knowing where it is.

Valeri:

It's something you can do, or something you can really escalate or help, ask for help from your neighbours.

Valeri:

But I also see quite often that if people are in the office, and now we are, of course, working hybrids, you often see people working remotely.

Valeri:

But if people are in the office, they just stand up across the, like, 20 metres gap between the tables and go to another team saying, hey, can you help me, please?

Valeri:

I don't know what's going on.

Valeri:

And I never saw someone saying no to that.

Karol:

Yeah, that's a good mindset.

Karol:

I remember back in the office days before corona, roaming through the office between different business teams, well, business domain system teams, and just going to have a chat to see what the problem actually is.

Karol:

Because if the business issued the ticket, that usually was very devoid of information.

Karol:

That was the usual problem.

Karol:

So just going to a different team to see, hey, your business issued me a ticket.

Karol:

It's like, I don't know what's going on.

Karol:

Can you give me some context?

Karol:

And let's solve that problem together to see if that's actually a problem that I can help you solve.

Karol:

Or is it the problem that is actually somewhere else?

Karol:

But let's cooperate on that.

Karol:

And it usually was a very good way just to walk by and just have a conversation.

Karol:

Because pinging with incidents and just reassigning it, it was very, very bad psychologically.

Karol:

It was like, oh, this again.

Karol:

Those damn...

Karol:

So that's good.

Karol:

And you said you have support in terms of training to work with incidents?

Valeri:

Yes.

Valeri:

Because, of course, the system is big and complex.

Valeri:

And you don't want to have...

Valeri:

And remember, 24-7 almost.

Valeri:

So you actually need support throughout the day.

Valeri:

So when it was a single team, it was easy.

Valeri:

One person from the whole team is on support per day.

Valeri:

Now when you have like 10, having 10 engineers doing constant support is expensive.

Valeri:

It's also heavy duty-wise.

Valeri:

So you introduce this tier structure where there is the first tier that can identify at least what's the direction of the problem and then escalates all the way to the relevant people.

Valeri:

But in order to be able to do that and for the people to know how to deal with certain issues, you do need training and help.

Valeri:

So that at least people feel psychologically safe to take this job.

Valeri:

Because it's part of their responsibilities.

Valeri:

But if you don't feel safe, because it's very different from just sitting and coding and delivering the code that builds and runs, here you still need to deal with real people sometimes.

Valeri:

Actually, very often you're on call with actual people from operations that are standing in front of the screen in the warehouse.

Valeri:

You can hear this ringing of hardware.

Valeri:

And they're also desperate because they have no clue what's going on.

Valeri:

And you have no clue what's going on.

Valeri:

You can imagine it's very stressful.

Karol:

And you try to figure out the reality of things in combination of two worlds, digital and physical.

Karol:

That must be quite stressful in that sense.

Karol:

But if you say tiers, it kind of reminds me of the approach.

Karol:

Because right now we have those development teams that they also handle the operations of the systems, right?

Karol:

So this is the singular responsibility of a team.

Karol:

You have a set of services that you're handling, you do the dev and you do the operations.

Karol:

The maintenance, the fixing, the troubleshooting, all of that.

Karol:

A different old school approach was like you have a dev team and you have a handover to the operations team.

Karol:

And the operations team is handling operations, right?

Karol:

You send it back to the dev team when there's something that requires a substantial- Oh, yeah.

Valeri:

In that sense, this separation is there.

Karol:

Oh, it is?

Karol:

Okay.

Valeri:

Yes, of course.

Valeri:

So the operations is basically people for whom they are our customers.

Valeri:

We are writing the system for them so that they can control this facility.

Valeri:

Still, though, developers are quite close to operations and to hold this world too.

Valeri:

But this separation is there.

Karol:

But you have the operations, which are the physical operations inside the specific facility.

Karol:

Yes, yes.

Karol:

Right?

Karol:

So they're operating machinery, they're operating the crates, they're operating- Oh, no.

Valeri:

There is also operations in the sense of analysts, people who look at the monitor from their home, if need be, to control this site.

Valeri:

And they are the natural translator between that and the operations team.

Valeri:

Okay.

Karol:

So they have more context of their specific facility, but they have the context of the actual services that they're monitoring.

Valeri:

To an extent, yes.

Karol:

To an extent.

Valeri:

One of the challenges with the evolvement of the architecture, how to make business to talk in our terms.

Valeri:

Because before it was a single system, they get used to calling it whatever it's been called.

Valeri:

And now we inevitably, as a dev, speak in different terms, of course.

Valeri:

And then they come with a feature and you explain, oh, that can't be done because we have multiple systems and they need to communicate.

Valeri:

So all those challenges are there.

Karol:

Okay.

Karol:

Yeah.

Karol:

Orchestration of work between teams is probably quite a heavy challenge right now.

Karol:

I remember from the time when I was doing operations, we had the dev team doing the delivery of the code and that moved to the testing environments.

Karol:

And in production, we had L1, L2, and L3, where L1 was just monitoring.

Karol:

Those guys were on 12-hour shifts sitting in front of monitors looking for what problems are monitoring the ticket system, just looking at if there are any problems in production.

Karol:

L2s were those that were usually solving those problems if they occurred and seeing if they can solve the problems just by reconfiguring, restarting, offloading something, etcétera.

Karol:

If this problem could be contained within the integration platform, they usually fix that within the integration platforms.

Karol:

If that couldn't be fixed just solely in the integration platform, that went to L3, which was usually me, to solve that by cooperating with other systems, often managing fixes on sites on the integration platform and the system as well.

Karol:

So orchestrating that and then issuing that quick fix with an emergency release or planning that fix within a minor or major release were natural in that sense.

Karol:

So it was the kind of levels.

Karol:

Does that resemble anything in picking them, this kind of classification?

Valeri:

Or is it a different classification?

Valeri:

It's a bit different, yes.

Valeri:

So there are obviously people on the side themselves who are those monitors with different alerts for hardware malfunction, but they likely fix it manually, right?

Valeri:

So no involvement from that, etcétera.

Valeri:

Then you have operations people who control the site in the sense of they have a plan, those ATK deliveries of how many per day that needs to be packed and dispatched.

Valeri:

So they look already at the business flows and alerts from that perspective.

Valeri:

And when they can solve operationally by rescheduling something or turning down one operation, so let's say no more gel packing, we all focus on picking only, they do that.

Valeri:

But when they see that the behaviour of the system, there is something wrong, they often escalate already towards that part.

Valeri:

And there you see tiers.

Valeri:

The tier is one is the first single point of contact, but saying, hey, something's going wrong, can you then either solve it or find who's going to solve it?

Valeri:

And then it's already...

Valeri:

And then next tier is the people who more closely and closely related to a specific part.

Valeri:

But it's not necessarily the behaviour of the system.

Valeri:

It can be like, oh, exactly, something happened hardware-wise, and there is no easy workaround for that.

Valeri:

Can you help us figure out what we can do if this type of request also happens?

Karol:

Okay, so you kind of have that kind of layered system as well.

Karol:

But this is specifically in the dev teams that how to manage the workload of incidents.

Karol:

And the operations in Picking, they actually more on the business continuity and the line continuity.

Karol:

So more of a progressing with packing, picking and packing, rather than looking at the operations of the specific services that are there.

Valeri:

It's more of a business process rather than a tech process.

Valeri:

Specific service that goes to one of the dev people in that team.

Valeri:

And that's a rotation, basically.

Karol:

So someone calls, someone monitoring, this kind of thing.

Karol:

Mm-hmm.

Karol:

Complex stuff.

Karol:

Complex.

Karol:

Now, just to change the topic a little, still being on the psychological side of things, rather than the technical one.

Karol:

But a completely different topic in that sense.

Karol:

You yourself, being a woman in tech, I must ask, did you have challenges about you being a woman in tech?

Karol:

Because the stories differ.

Valeri:

Yeah.

Valeri:

Are you sure, Flea?

Karol:

No.

Karol:

I knew that answer already, so.

Valeri:

Yeah.

Valeri:

Yeah, I'm aware that the challenges exist.

Valeri:

Well, luckily, you have quite a good portion of girls in tech in Picnic.

Valeri:

And I do hear their stories.

Valeri:

I was in the lackey spectrum.

Valeri:

I've never encountered that.

Valeri:

So from the very day X, even though in university, the girls were obviously a minority, then same with the first companies, I never felt that my gender plays any role.

Valeri:

I think only once it's been mentioned to me, all the way back in the university, from one of the teachers, because I was skipping her programming lessons.

Valeri:

I didn't like them, because she wasn't able to explain object-orientated programming well enough.

Valeri:

So I'm like, why are you skipping my classes?

Valeri:

You started on a nice project.

Valeri:

But actually, girls don't become developers.

Valeri:

So maybe you should reconsider and become a designer instead.

Karol:

No, that doesn't work like that.

Karol:

In spite of her, I'm going to do development.

Valeri:

Yeah.

Valeri:

But I get that she probably had her own beta story, being she was trying to be a developer in tech from the 60s, whatever, and ended up in the university.

Valeri:

So I can imagine she has a story of her own.

Valeri:

But apart from that very early story, I never felt that if I'm a woman, that changes anything.

Valeri:

To be honest, I felt more insecure about my age than gender.

Karol:

Oh, I got that one as well back in the day.

Karol:

That's why I have this.

Karol:

Well, even if I try all I can, I think it's not possible.

Karol:

That's a completely different set of problems to solve.

Karol:

But the origin of my beard was that I was looking so young and people didn't take me seriously back in the day when I started as an architect, that I started having a beard to age myself visually.

Karol:

Did it help?

Karol:

Yeah, it actually did help.

Karol:

And just the beard just stayed.

Karol:

And I started like, oh, I actually like myself in a beard.

Karol:

And I was like, let's leave it.

Karol:

And so my kids, my wife, they have no idea how I look without the beard because they never seen me without the beard because it's so old.

Karol:

But for how long have you been having the beard?

Karol:

Let's see.

Karol:

The last time I shaved was 2014 or something.

Valeri:

Okay, okay.

Valeri:

So, oh my God, 10 years.

Karol:

11, 11.

Karol:

I can check that in actual timeline because that was somewhere around my best friend having a very specific party.

Karol:

And that correlated somewhere with me changing a job.

Karol:

So let me just quickly look in LinkedIn.

Karol:

Yes, that would be somewhere around 2014.

Karol:

So that's 11 years ago that I shaved last and that related to that party because I already started growing a beard.

Karol:

But the party was themed red, interpreted however you want.

Karol:

So basically, I dressed in a lab coat and painted my whole head, well, hair and beard red.

Karol:

And despite the paint being washable, it didn't want to wash away.

Karol:

So I had to shave my whole head and beard away, just not to be red anymore.

Karol:

And that was the last time I shaved.

Karol:

So 2014, just to make myself shows older, it had the same concern about my age or being perceived as too young.

Karol:

But afterwards, no more.

Karol:

To those daring me in the comments to show a photo of me without a beard, no, those do not exist anymore.

Karol:

I mean- Erased from all the service.

Valeri:

All right, yeah.

Karol:

I was very diligent about erasing some parts of my digital identity, just to be- I mean, I dare anybody to find this.

Karol:

If they do, kudos, I'm going to buy you a beer if you come to the Netherlands.

Karol:

Just to be fair, I'm going to buy a beer if somebody finds a photo of me without the beard.

Karol:

Good luck with that.

Karol:

And then, okay, you were lucky as a woman in IT.

Karol:

Given that, in my personal opinion, we still do not have enough women in IT, I would love to see more women architects.

Karol:

I would love to see more developers.

Karol:

I mean, it's changing slowly, gradually.

Karol:

I see more and more.

Karol:

If I compare it to 10 years ago, where most of the dev teams were men only.

Karol:

And if you would have a woman in a team, that would be the project manager or a scrum master, something like that.

Karol:

Now we have actually devs and architects.

Karol:

That's already an amazing change.

Karol:

What can you say from your perspective, even though you didn't have those challenges that much, to encourage women to be in IT, to pursue programming jobs or architecture or testers, whatever that may be?

Karol:

I know a difficult one.

Valeri:

I would say, yeah, it's like with anything else.

Valeri:

For any other skill, I would give probably the same answer.

Valeri:

If you love something and you want to do this, fuck it all, just do it.

Valeri:

It doesn't matter what's your gender.

Valeri:

It doesn't matter what people think about you.

Valeri:

It only matters if you enjoy it and if you invest enough time in this, that's all.

Valeri:

Because whenever I felt, let's say, as if something is wrong with me while being in IT, I always attributed it for me not studying enough or not being good enough in the sense of knowledgeable enough, but never that it has to do something with my gender.

Karol:

Just to put it in perspective, how often did you fail at doing tasks?

Karol:

How often did you make errors over your career?

Karol:

Or how often do you do that now?

Valeri:

That's a new definition of error, right?

Valeri:

I told you, I almost put down the whole rabbit cluster, but it's not to say an error.

Valeri:

It's something you learn from, and you just accept it and you fix it.

Valeri:

That's it.

Valeri:

I wouldn't say there were any big failures in terms of that the company suffered from that, or I was lying for a week on my couch, crying my eyes out, but definitely, if I turn back, I will find the amount of decisions I'm not happy about in terms of how the systems are built or technology with children.

Valeri:

But of course, they are, let's say, the minority, out of the blue, like 10%, 15% at least.

Valeri:

Maybe more.

Karol:

But you would treat them as learning experiences, right?

Valeri:

Exactly.

Valeri:

I mean, but that's one of the ways to learn.

Valeri:

You fail, you learn, you learn.

Valeri:

I can learn.

Valeri:

I can do that because I'm not working in the industry where the failure is critical.

Valeri:

I mean, obviously, if you're working with medical equipment and the human bias at stake, that's a different story.

Valeri:

In the worst case, yes, it costs my company money, but then I will be there on the line of fire to fix it as soon as possible because it's my mistake.

Karol:

So, if we could translate that to an advice, that would be own your mistake and learn from it?

Valeri:

Yes, definitely.

Valeri:

And I think if you run into challenges, of course, we work, at least I'm working in a multicultural environment, and people coming from different countries and different backgrounds have different mindsets.

Valeri:

And I did see that it's not for everyone very digestible when a female comes in and gives you, let's say, putting in quotes.

Valeri:

But I've never seen people being so stubborn or so unconvertible that it becomes pain in the ass, like a good talk and just being consistently showing that you are here not for beautiful eyes, but because you do stuff, you learn, etc.

Valeri:

Never failed to change people's opinion.

Karol:

So, just do what you love and be persistent about that and consistent.

Valeri:

Yeah, exactly.

Valeri:

And, you know, I do love stupid things like wearing cherries around me, and I don't care if I find myself in the room with 40-year-old guys or 60-year-old architects that look at me like, what this girl is doing here?

Valeri:

I'm still going to jump and be happy while being a person who designs systems, yes?

Karol:

In the end, it's about delivering and being passionate about the job, I suppose.

Valeri:

Yeah, definitely.

Karol:

Definitely.

Karol:

With this, we've been at it for two hours and 15 minutes.

Karol:

I think it's a good time and a good point to end the stream today, since I did all the advertisements already.

Karol:

Dear viewers, thank you for your attention.

Karol:

Valeria, thank you for joining today.

Karol:

And everybody, have a lovely night or a lovely afternoon, wherever you're watching, and see you on the next stream.

Valeri:

Yeah, thank you for