Skip to main content
From Batch to Real-time

From Batch to Real Time: Event-based Architectures

  • 05 June 2020
  • Enterprise Modernisation

Consumer demand for instantly available, responsive, and consistent experiences was rapidly intensifying even before the COVID-19 crisis. Now it’s taken on new significance. Particularly in financial services, retail, and health care, batch-release systems are preventing global enterprises from delivering the type of experiences their customers expect.

In this session, Kin + Carta VP of Cloud Modernization Mark Ardito, explains how real time, event-driven architectures built in the Cloud create new value across the enterprise. Explaining the new speed-to-value equation, unlocking new product capabilities, enabling more reliable analytics and more efficient testing, he breaks down what moving to real time systems entails, how to get started, and the obstacles to overcome along the way.

Return to all on-demand sessions

Go back


Mark Ardito, Vice President of Cloud Modernization, Kin + Carta


Something pretty cool happened about five years ago. We got a new grocery store in Mount Prospect. I know that sounds crazy, but I guess you care about those things after you hit 40. So anyway we got a new grocery store and it was pretty cool. Like my wife and I really fell in love with it. They just had like, great produce, fantastic deli artisanal cheeses, a wide variety of foods, a fantastic bakery. You couldn't ask for anything more out of a grocery store. So we found ourselves five years ago just starting to spend more time in a grocery store. And it started to be time consuming right? We're in the grocery store for 60 to 90 minutes per week. I have a full time job. My wife is an entrepreneur with her own business. And we have two junior high school children, like this is too much time. We can't be doing that. And then the grocery store did something really awesome. Two years ago, they launched online shopping with curbside pickup. So that was two years ago. They were one of the first grocery stores to do it, and it was a game changer for us. So we downloaded their app instantly. We ordered our groceries all through the week. We had a standing appointment on Sunday evenings, and we just kind of pulled up and said that we're here and they loaded all the groceries in the trunk of our car and we took off and went home. Fantastic, It was a game changer for us. We're no longer burning 60 to 90 minutes in a grocery store per week. We're doing everything from an app and life is great.

This is what the grocery store looks like. So it's our typical grocery store. Our typical order for a family of four, I don't know, it's seven to eight bags of groceries per week. So things were going really well for us. And then 2020 happened, and things kind of went sideways for online ordering. So this stuff started happening. And I could slowly see this grocery store's online ordering system unraveling in front of my eyes. People were over buying things inside grocery stores like this, which were causing issues in the supply chain, which then had issues on the app. So let me walk you through this. So during the week, we're just ordering things throughout the week. Toilet paper here, some carrots here, some other items here right? So you order the items, and then you go and pick them up at your set time. And so that first time during the pandemic, when we went to go pick up our groceries, the person came out, and they had a single bag with them. And they said, “Here are your groceries.”

Now, the whole time, I was able to, on the app, order all of my groceries. It said thanks for your purchase. Right here we've received your thing. We've received your order, and we go there and they handed us a single bag. And so we said, where are all the other groceries? And they said well we don't have any of those things. And so we're like huh, so apparently the grocery store can't tell the mobile app, I don't have flour anymore. I don't have toilet paper anymore. I don't have hand sanitizer anymore. But it allowed us to order those things. And so it made my wife and I realize that we have become just almost to the point where it has to be there, over real time data. Right? We've become so reliant upon this idea that if the app shows me there's toilet paper in stock, I'm expecting toilet paper to be there when I go and pick it up. Right? Why would you let me order this stuff if you didn't have it?



I suddenly saw this idea that this retailer, this grocer that we really love, they're losing money on this stuff. I'm a paying customer. I'm willing to spend all of the money and go there, yet they can't give me any groceries. I'm willing to give the money to them. And so one story that really hit home for us was Easter happening during the pandemic. And what my wife and I were gonna do, we were gonna order groceries, cook a full meal for all of my family members, deliver those meals to them so then that way they could eat together and we were gonna have a Zoom Easter dinner together with my family. And so we ordered all the groceries and we go and pick it up. And we couldn't even get enough groceries to cook one of the dishes we had planned for Easter dinner. So this thing totally was a disaster. And just to fast forward you, this grocery store wound up just taking all the items out of their mobile app, and they no longer show them. 

For example, toilet paper. If you search for toilet paper in their mobile app, you can't even find it because they just took it all down. So again this is a real problem that they couldn't relate inventory to the mobile app, and they weren't communicating data in real time. Cool, so let me break this down for you. We're gonna get through some basics first, and I'm gonna walk through some architectures. But don't worry, I'm not gonna go too deep. I'm not gonna code for you, and I'm not gonna do any live demos. So we're gonna keep this pretty high level for you.



So let's talk about some basics. This is what an application looks like. This is a three-tier application. This kind of makes up a really large percentage of the industry today. Inside organizations’ data centers are these apps. You have an interface, you have some sort of service, and then you have a database. And the concept of this is the user invokes an action, clicks a button, fills out a form, does something and that sends the thing, the action to the database, and the database returns it back with the data. So we've been building applications like this in the industry for 40 to 50 years. This is the common pattern that most applications have. It's basic and it works. It's the fundamentals of, you have an interface and that's how you interact with the database. The premise is simple, and it's all based upon users invoking some action.

However, it makes one massive assumption and that's the assumption that data is passive. So in this case, the data is just sitting in the database. It's not doing anything. It's just sitting there and it's passive. It's waiting for a user that says, I want to know something about this through the interface, and then it goes and gets the data and brings it back. It's a major assumption, and it's gonna cause some issues later down the road. So I'm gonna walk through that. But this thing has done really well. And so most organizations have these things. And so life is good, and you have an app. And then you have two apps. And then you have four apps right? And so next thing you know, you have a data center full of these things, and you're a grocery store and you're going out to the public telling everybody that you're a technology company right? So all of a sudden, you have a whole data center full of these things. However, notice that none of these systems, these applications, or these databases, have interoperability between them. Nothing is sharing any data with anybody and nobody can figure out what's in that database, or what's in the other. You all have disparate systems and that's it. So you can see there's no interoperability between the apps. When one application updates data in a database, the other databases don't know about that.

So let's think about the grocery store in this example. Maybe the top row here could be the mobile app, and maybe the bottom row could be our distribution or inventory system. Right? so the mobile app has no idea what's going on inside the inventory system. Maybe one of the middle rows, it's like the member rewards program. It doesn't know anything about who's purchasing what from the point of sale system or the inventory system. So we have a big problem here. So it's just, it's an issue. There's no interoperability and nobody can share data together.


However, the tech industry progresses and we get to this thing. Right? So let me walk you through this. This is extract, transform, and load. That's what ETL is and that's represented here. So on the far left hand side, you extract tables or records out of a database, you do some sort of transformation in the gears, and then you load that into your analytics database. This is essentially the start and the premise of what data warehousing and data lakes turned into. Right? However, there's some fundamental problems here that you guys need to know about. It's a batch, this stuff takes a very long time to run and you can't do this during the day. Right? so if you're gonna run one of these jobs during the day, you would take down your production environment because it takes so many computer resources to get there.

So now we're starting to share data, but it falls under this premise, that this is data that happened yesterday. Right? so it's not telling you what's going on about your business today. It's saying this is what happened in my business yesterday, and in the past. I equate ETL to buying a newspaper. Right? When you buy a newspaper you bought yesterday's news. You can easily go online and read an article today, and that is news that's happening right now. Or you could wait tomorrow, pay 50 cents to dollar, go buy a physical newspaper and read that story a day later right? Why would you ever buy yesterday's news? You could read it right now and it's for the taking. So ETL is essentially buying a newspaper to me. It just tells you what your business did. It's not telling you what your business is doing. But we make progress right?

So the tech industry again. We start learning things and we progress and we introduced this thing called Messaging. So messaging is kind of the first attempt at real time processing and real time sharing of data. However, it has some fundamental issues as well. So it's hard to scale, and there's no persistence to this. So let me walk you through this. In this example here our applications, those databases are gonna write to a queue that you see in the middle. And inside that queue are other systems that subscribe to that, so this is that pub sub model that people have heard about for a long time. So people write into that queue. However, when you write and you read that message out of the queue, that's it, the message is gone, you've consumed it and there's no persistence. So you can never go back and say like, “Hey I had an issue with that message. Can I go back and get that?” There is no persistence and so it is a one and done. This is the typical pubs model that you see a lot of organizations have taken this model. But again once you consume that message, it's no longer there. So if you wanted point of sale data to get to the data warehouse, and then all the other systems like that, you would have to stand up multiple queues, continue to write all of these records into it. There's another issue with this as well. This starts to run into scaling problems right? So I have to stand all these up. But then what if the data warehouse isn't reading messages off the queue? The queue starts to stack up, this thing crashes and tumbles and then all the messages are gone.


A lot of times technology like this, they'll store it in memory, so if there's an issue, the memory gets wiped out as well. If we're gonna stick with the newspaper metaphor, I think messaging is kind of like the evening edition of the newspaper. I don't know if people remember that, but newspapers like the Chicago Tribune, they used to print like the evening edition right? So you would get the morning edition when you're on your way to work, and then the evening edition was printed of all the news that happened throughout the day. So we're starting to get better at getting real time news, but it's still kind of in the past. So I equate ETL to the newspaper messaging, and it's like the evening edition of it but look we can do better than this. And there's ways to do that.

So great, the grocery store. Do they have to rewrite and get all new applications? And what I'm saying is no, you don't have to rewrite and get rid of all of that stuff, because let's be honest, there's never gonna be a time in a company's life, where you get an opportunity to rewrite applications, all of them, or fix all the problems that you have. There has to be some way where you modernize something that you have to get you to the place where you need to be going. And that's what I wanna walk you through here.

So we come to this place called Event Streaming. And this is really what Apache Kafka is built on this premise. And so I'm gonna walk you through this. So the move to real time streams and event driven architecture is what gets you to this. So in this example, it's the ability to react to streams of data of the events that are happening in your business. So think of this as reacting, storing, and processing an infinite amount of data, instead of storing that in individual tables just to be analyzed later. Another way to think about this is, you're gonna see strings and you can understand what's going on in my store right now, not what happened in my store yesterday. So in this case, so let's take these databases as an example. In this case here, every time something changes in the database, you write that event to this log that's below. So you don't write the full record, you wrote what changed since that record. That is the key to this.

See, data is persistent and it's stateful inside those databases. So the data represents state and the premise of event streaming is that the events have far more value than the state of the data record in the database. So let's take a shopping cart as an example. So if I'm shopping on the mobile app at that grocery store, I can query the database and it will say what's in my cart. If that's the state, here's the cart. However, there's a ton of value and understanding the changes that went into that cart. I put an item in, I took an item out, I put three in I reduced it to two right? So the changes that you see and the events are starting to become far more valuable than querying the database to see what's in the cart right? So this is far different from the messaging example that I gave where messaging was a step in the right direction, but this is really going into the right direction where we see the changes that are happening, and again the events are far more valuable than the record itself. Yeah so in this, these changes get written to the commit log, Kafka will order it, it will sequence it for you, you can consume it from any spot.


The nice thing with Apache Kafka is it has persistence, and it orders it for you. Meaning that if I'm a downstream consuming system of this event stream, I can read it today. I can replay it tomorrow, and if I had an issue with it, I could replay it the day after that. So you start to get this persistence and ordering and fast low latency that a lot of people are looking for in event streaming. So this really starts to push where people need to be going. So let's take a look at what a possible architecture for eventing may look like for this grocery store.

Okay so let's break this down here. On the bottom you see all of the different stores that the grocery chain has. And then you see some distribution databases, that they have some online tools, that they have all those things that those apps say that they have, those three-tiered applications that I started out with. So in this case, all of these stores, let's take the three stores on the left hand side. They're gonna write everything that's happening on the point of sale system into the point of sale stream. That point of sale stream can have consumers to it, and those are the things on top. So let's say our data lake in our reporting system, you really wanna know what's going on right now in my stores. You connect to the stream and you can see that instead of waiting for tomorrow for the ETL job to run, and you can find out what's going on maybe yesterday right? That doesn't really help you when you're going through some of the supply chain issues that the world went through with COVID-19 here.

So as you can see, we also kept all of our legacy, mainframe, old applications, things like that, intact in the bottom. We built this eventing platform, Event Stream, in the middle, powered by Apache Kafka, and then we can connect our applications, our online tools, and reporting micro services data to that. So this changes the paradigm that we are used to knowing. So we're used to knowing that data just lives in this system. If I want to get that data, I have to take it out, extract it, and put it in this other system. This paradigm is every time we make a change, I should be publishing these changes because the events of those changes are far more valuable than that record. So if we stick with that newspaper metaphor again, ETL is the newspaper, messaging would be the evening edition. This is the Online Edition right? Everything that changes I can go to the Wall Street Journal online and read this news right now, it's happening and it's live. I can see it. Or I can wait till tomorrow, go to the newsstand, buy the Wall Street Journal and read yesterday's news. So this kind of takes that paradigm there.


The other thing that's powerful here in this is as we're publishing two streams, Kafka has a feature out there called KSQL, and that feature allows you to start to run queries on some of these strings and also join streams together. So this makes it really valuable for your online app that needs maybe point of sale stream data, it needs inventory data, it also needs member loyalty data, so it needs all of these streams put together. You can start to combine those with KSQL, run queries, run joins, put all this stuff together, and it really starts to become a game changer for the rest of the industry here. So let's talk through my example of ordering on the app, and not being able to have inventory there. So in this example, you could see if that mobile app on the top left up there, if that mobile app could have been integrated into a stream, it wouldn't know not to show toilet paper, hand sanitizer, flour, yeast, all the things that people were hoarding when COVID-19 hit. And they would know that because it has the store data, it has the inventory data, it knows to not show that stuff. You could slowly see how I figured out that the grocery store was doing something bad, or maybe even the mobile app wasn't even connected to these data sources. So a lot of organizations are struggling with this. This is just a great way to move for an organization, to help modernize, put you in the right place for kind of what's expected in 2020, and not have to rewrite some of that mainframe, coz honestly that's a huge massive job to rewrite a lot of that old legacy stuff. There's still value in it, and you need to get the most value that you can out of it.

So in case you were daydreaming throughout the talk here, let's just give you a quick summary. So customers expect real time data. It has to be there. It's just what we've grown to expect and there's no substitution for it. ETL it's kind of like buying the newspaper, messaging or getting better if you buy an evening edition, and event streaming. We all love it.

Want to know more?

Click here to get in touch