Inaugural Lecture Series: Professor Stephan Reiff-Marganiec video transcript

The title screen fades in. A dark blue background appears with the University of Derby three hills logo appearing in white in the centre of the screen, shortly before white text fades in underneath it which reads:

Inaugural Lecture Series:

The title screen fades away, and a new title screen appears. This time, the University of Derby three hills logo sits in the top left corner of the screen. Parallel to the logo, on the right-hand side, a white line separates it from the rest of the title screen which shows a red border surrounding text that reads:

Enabling a Smart World Through Service Computing and Data Processing Architectures by Professor Stephan Reiff-Marganiec – June 2021

The title screen fades out and we see webcam footage of Professor Stephan sitting at his desk, about to introduce himself. He is wearing glasses and a dark blue suit with a red tie and has short hair combed to one side.

[Stephan] My talk today will be about looking at the world as we see it today from the eyes of a computer scientist, but also, from the view of people and customers of all the computing technology in the world. And then I will relate to the many years of computing research that I have done, and how they are providing building blocks to the solutions that we need in today's world, to take computing to the next few years.

We cut to the first slide and the webcam footage appears much smaller on the right-hand side of the screen, where it will remain for the rest of the presentation.

The slide shows a background image of one of the university towers at the Kedleston Road campus on a clear, sunny day, shot from below looking up. The University of Derby three hills logo appears on the top left of the screen in white and the lecture title appears again over the top of the image.

The slide then changes to a plain white background with the University of Debry three hills logo now appearing in the top right corner of the screen in black, where it will remain for the duration of the presentation. The new slide reads:

Context: Research Pathway

Inspired by Application of Research Results and Applications Driving Research Questions

UG project in formal verification with Praxis CS
PhD in runtime resolution of feature interactions in Telco (with Mitel)
Web services research
Cloud, IoT, Fog and Edge

Industry, Academia and Combining the Best of Both.

Apprenticeship and work
Study, academic roles and research
KTP and innovation partnerships, LIH director
Now HoS for computing and engineering

[Stephan] So, just very briefly reflecting on my own research pathway that has taken me to where we are now, and some of this was hinted at already in the introductions. I have always been inspired by the application of research and by applications that are driving the research questions, So, I've always been relatively close to looking at what the world out there needs and how the research that I'm working on works in this. So, it started with my undergraduate project where I was working with praxis critical systems on a little validation problem, and through my PhD and postdocs, I was on projects with heavy company involvement where the problems that we were addressing were really driven by needs of the industry but needs that needed new and novel research and innovations to take place.

Once I finished my PhD I moved into web services research, an area that was just coming up at that time, but that very naturally followed on from the ideas that I looked at in my PhD, where I was looking at systems that are changing dynamically at one time, and we will see more of that throughout today's lecture. In the more recent future, we have seen a lot of changes in the computing infrastructures; we are talking about Cloud, Internet of Things, Fog and Edge computing. And all of these are enabling architectures that deliver and enable us to build the systems that we're all using today, whether we are actually aware of using them or not. And we will see some of that as well in the presentation. I have also, always been on that boundary between applied research, working with companies, looking at where research fits into this and obviously the very academic areas, some of those are much more pragmatic, some of those on the very formal end of computer science. And that, I guess, has taken me to where I am today. And in the lecture, we will see many of these aspects.

The slide changes to depict a stylised collection of bullet points, outlining the structure of the presentation. Each bullet point is placed in an arch shape which bows out from the left side of the screen, with each bit of text highlighted by a dark blue background. The slide reads:

Overview:

Motivation
Key issues for an intelligent world
Puzzle pieces: cloud and service computing, IoT, AI and ML
Data processing architectures
Integrating AI and ML
Conclusions

[Stephan] So, in today's session, I want to structure it in a way that... I want to give some motivation for why we're doing the computer science we're doing. Today I will look at the key issues that I think we have that are holding us back, but also issues that are enabling us to move into this new phase of where the world has gotten to. I will look at the puzzle pieces that we need in terms of technologies and ideas and research that we need to put together, and then focus obviously on the aspects that I have made contributions to over many years. And wrapping up towards the end of the session before we have questions.

The slide changes to show an image depicting a cross-section of a city, showing its infrastructure from the coast, to farmland, to a nuclear power station and finally to the city skyline itself. The image is titled:

Libelium Smart World

To the left of the image is a series of bullet points which read:

Motivation:

Smart Cities:

Smart homes/buildings
Smart car parking
Smart bins

Smart cars:

Autonomous vehicles

Smart Agriculture:

Supply chains
Resource monitoring
Targeted fertilisation

Smart Medicine:

Precision medicine

Smarter Finance

[Stephan] So, if we look at the world today, over the last year, a year and a half, we have seen a huge rise in awareness as well of data science happening. Unfortunately, the reason was, of course, the Pandemic, but that has brought data science very much towards the forefront for many, many people. But data science hasn't started there, and the data science we have seen there isn't all that we need to consider and look at.

So, over many years, some research in computing has looked at solutions to smart cities, smart cars, and many other areas prefixed with the label 'smart'. So, if we just pick out a few examples and then we'll look at some of those in more detail later on. If you look at smart cities, for example, today we can drive around cities that are more and more populated with what we call smart homes or smart buildings. They're buildings that manage their energy supply So, if the sun is shining on one side of the building, then that heat can be distributed around the building to the shady side of the building automatically with the building as infrastructure making decisions on distributing that heat to ensure that energy is saved. I'm sure we've all read about autonomous vehicles in the press, they have been in the news for many, many years.

Sadly, quite often they occur in the news when bad things happen such as some car driving into a standing vehicle... But again, it is the technologies, it's the computing and the data processing behind the scenes that are enabling these developments and ultimately, they will lead to very, very positive developments. So, for example, there have been trials with them, with lorries driving very close behind each other in large fleets with just a metre between the vehicles, on motorways, at motorway speed, enabled by these autonomous vehicle technologies.

This in turn then leads to the lorries that are not leading the train saving up to 30 to 40 per cent of their petrol, which is great for the environment. So, that’s the sort of advantage that we are looking at. All of these technologies are meant to make life on this planet more sustainable and better for people. Other areas are smart agriculture; over many decades we've heard stories about fertiliser being used in too-great amounts on fields and then entering our drink watercourses, and nowadays smart agriculture measures the ground conditions, it looks at the crop growth on fields and uses that information in near real-time to dictate where we are fertilising fields. And there are other areas such as smart medicine or smart finance, and if we do search around then they fit in everywhere.

But generally, what we find is that the computing technologies we have are nowadays empowering systems in which we are sensing information to then analyse that data to work with the data we're gathering to influence how we are then acting. And it's that piece of work that we're very interested in here.

A new slide appears. On the right-hand side of the slide are two images; the first depicts a bar chart outlining the growth of data stored in petabytes in an upward trend since the year 2000. The second image shows a news headline with an image of a satirical illustration underneath depicting a few collected cities as oil rigs, each with the logo of a recognisable tech company such as Google and Facebook. The headline above the image reads:

The World’s Most Valuable Resource is No Longer Oil, But Data

To the left of the images are a number of bullet points which read:

Data, Data and More Data:

Data is the most traded commodity in terms of value
90% of the data we have was collected in the last few years

Data Processing:

Collect -> process -> store -> act
Collect -> store
Collect -> process -> act
Collect -> process -> act -> store

[Stephan] And what we find in that context is data, we find data everywhere. And if you look at the press it's very interesting because it's usually not the computing press that is picking up on the big stories.

So, a few years ago The Economist reported, for example, that data is now THE commodity in the world, it's the most valuable resource we have. It's worth more than oil in terms of the economy and we can see that this data, often referred to as Big Data, has grown very, very rapidly in the last few years. So, another quite surprising fact, at least it's a fact that I always find very surprising, is that over the last few years whenever one looks at numbers, it's around, the statement is that ninety per cent of the data we have, the total data and information we have at our disposal were collected in the last few years. And that has been rolling in this way over the last five to six years. So, whenever we look again, we have collected that much more data, that we still have collected ninety per cent in the last few years. Very nice exponential behaviour here... but also very, very scary behaviour.

So, if you think back for how many millennia humanity has collected data then it is very, very worrying that we are collecting data at this rate of growth because, in the end, we have to do things with data. And what we have to do is we have to process it. We have to make sense of that data. And typical data processing cycles are collecting the data, processing the data, So, making some sense of it, possibly storing some of that data and acting on the data, because just knowing that we've done a lot doesn't help.

So, if we look at a very obvious example: many of us have a kind of step counter and Fitbit or use our phones or Apple watches or whatever device we have to count the steps we do in a day. That's great. The phone will store that information for us, and we know we have done 5000 or 15,000 steps, but if that's all we have and we don't do anything from it then what does that data do for us? Because in the end what we want is we want to act on this data, we want to know that if we do more steps, it will be healthier for us. But what does more mean for us? And it's these decisions that I think the systems we are developing need to address. Other chains in data processing and that's one that's been relatively common with some of the big kind of Cloud companies such as Google and Amazon and so, on, and one that reflects the value in data is the sort of collect-and-store approach. So, many of these big players are just trying to collect as much data as they can and then store it because they know or they think they know that at some point in the future there's a lot of value in that data for them, but they haven't yet managed to access it.

For me, another very interesting approach to data processing is the collect-process-act cycle. So, we collect some data, we get some insights, we process it, we understand what it means, and we act. And this is, I think, it's a cycle we see in many of the kinds of current scenarios that are emerging. Smart car driving down the motorway sensing that there's an obstacle ahead; processing, identifying that that obstacle needs to be addressed by slowing the car down, changing the lane and bypassing it. And we do that, we have collected the data, we've processed it, we acted, and we move on. Nothing happened, everything is fine, exactly in the same way that the human driver would operate the car. So, there's no need to store that data.

But maybe we sometimes do want to store, and again we can look at different cycles there. But we have in this data processing varying components that we do need to look at, from collecting the data to acting on the insights we can gain from it.

A new slide appears. This slide shows a number of screenshots of headlines from online articles on the right-hand side, piled on top of each other in order to look like papers messily scattered on a tabletop. These headlines are a mix of news stories with both optimistic and pessimistic subtexts about the future of technology. The headlines read as follows:

British Airways data breach: what to do if you have been affected
Tesla autopilot caused the car to accelerate before the fatal crash, investigators find
AI cancer detectors: researchers suggest artificial intelligence is now better and faster at detecting cancer than clinicians
From flying warehouses to robot toilets – five technologies that could shape the future

Below these headlines is a line graph showing the sharp, upward trend in the number of social media accounts, network-connected devices and data solutions since the year 2005. To the right of this graph, a number of bullet points are listed which read:

(Some) Key Issues for an Intelligent World:

Transporting data

Bandwidth usage

Making Sense of Data

Processing data
Processing it fast
Processing it correctly

Storing Data

To store or not to store?
Structure

Quality of Data

Managing the quality
Auditing
Privacy

[Stephan] So, out of that, we can look at the news. And the news is coming in daily; sometimes it comes in as bad news, sometimes it comes in as good news. So, if we look at a lot of AI applications in medicine, then quite often they are very reliable in identifying, for example, cancers from images that we're seeing.

If we're looking at automated facilities in warehousing and transport then again, we see a lot of very positive developments, and now in some cities, for example in Milton Keynes, you have little delivery robots driving around the streets doing deliveries to houses. But quite often what we read in the news is about data breaches, it's about automated cars getting into accidents and all of those problems.

We also find that the data that we're gathering comes in many, many different forms. So, we sort of, for data analytics problems, often talk about multi-modal solutions, where we are looking at data that's in different shapes. And the data that has been growing most rapidly in the last decade is data that we gather from sensors. So, generally, what we call the Internet of Things. So, this is data which personally I wouldn't call Big Data; to me, it's sort of lots of little data because it's temperature readings, it's some position readings, it's very small items of data, but items that we are collecting incredibly rapidly - maybe in one-second intervals, maybe in millisecond intervals, and that we're transporting to the Cloud to store.

The other major growth in data has been in social media. Social media, if you think back even ten years, then many Social Media channels were just emerging. They were something that people were obviously starting to become aware of. But if you look through the last US Presidency, for example, the activity on Twitter in political circles has hugely stepped up, and that sort of reflects the general behaviour of the population in these messages that are being picked up. And again, it is in some sense lots of little data that's adding up into this Big Data scenario because Twitter messages aren't that long, but there are millions and millions of them every day. When we think about that then I think we have to really quite seriously take a step back and think about what are the issues that we are causing in this world and what we need to do to really make the most of those potential data feeds and applications that we have.

And, one question there is very much around transporting data; because in the end if you're collecting data from a sensor and we're shipping it to some Cloud server, we are using the bandwidth, and our 3G, 4G wired internet services, and we're swamping them with data. Especially if we're sending data at very rapid intervals. Often in rural areas where connection speeds might be slower, and we're overlaying that on to the normal use of the internet, with people downloading videos, doing email and doing their daily work, then with the growth that we see we will reach a natural limit of how much data we can transport. So, we need radical solutions to that. It's making sense of that data. Yes, we are amazing at collecting this data, and we're pretty good at storing it, but can we process it at the pace that we need to? And to me, there are two ways of processing data, one is to collect it and then analyse it and retrospectively start to make sense of it - and there's value in that in terms of policymaking.

So, if we know, for example, that the road is always overcrowded, then we do know that we need to maybe build another lane on that road. But that's a very, very slow solution. I think, for many applications that are exciting and interesting we need to do things differently. We need to process data very fast; we want quick insights into the data. And many medical applications and many transport applications are of that nature; when you're driving into a town you don't want to know that there was, that there's going to be a traffic jam, what you want to know is that as you're approaching it that you're sent a different route because that will get you to your destination quicker. And crucially, making sense of data has to be correct, So, we have to really make good decisions on the data that we're making. Making the wrong decisions then, the consequences can be quite catastrophic.

We also, need to think about storing data. So, if we look at some other statistics then, today some data centres - So, those are the big data warehouses that the Cloud providers use - are using a disproportionate amount of the energy we are producing. Depending on which statistics one looks at, it's on par, above, or just behind the airline and global shipping industry. That's a vast amount of energy used. That all leads to more CO2 in the atmosphere, and we all know that that's not very good for our climate. But as we're storing more data, we need more such capacity, so, we do need to start to think about what do we do about this data? Do we really need to store it? And that's a very, very difficult question to answer because if we don't store it and we realize two years later that we need it, or even two days later, then we've lost an opportunity.

If we do store it and we never need it, then we have wasted energy. So, this is a question we really need to explore. And it might be very application-specific. But we also need to think about the structure of the data we store. And traditional enterprise data is often stored in - for example bank records, transaction records in banks, your internet shopping kind of invoices and so, on - are structured into SQL databases in a very standard way. They're very efficient for standard kinds of searches, but we often want to find more interesting insights into our data, and we will see some of the structural aspects later in the talk.

And the final two aspects are around the quality of data. So, how do we know that the data we have is good? And how do we audit and provide that evidence trail on the decisions we have been making?

And finally, privacy. I think certainly in the Western World we are very concerned about data being leaked, about privacy being invaded, and I think we do need to think about that.

The slide switches to show a list of headings and subheadings, with the main headline reading “Key Concepts (1)”. The subheadings are as follows:

Cloud Computing

Compute infrastructure that flexibly scales to demand; usually ‘resourced’ through data centres; comprises both hardware and ‘software stacks’; the main enabler for ‘big data’

Edge and Fog Computing

Moving activity away from the cloud and distributing it to other devices

Service Computing

The concept of encapsulating an activity in a software service that can be used on-demand and can be composed with other services to achieve a bigger task, ‘just-in-time’ vision

Business Process/Scientific Workflow

A sequence of tasks to achieve an objective; each task can be instantiated by service and run on a computer resource

[Stephan] So, if we take all of those kinds of challenges that we have in the world and we move that into the kind of concepts and ideas we have in the computing community that is close to my research, but in general that exist in the community, then we do have a number of key concepts that will make their appearance throughout the rest of this presentation.

One of those is Cloud computing. And that's the idea that we have computing infrastructure that's owned by someone but that we can tap into for our processing, for storage. And that enables us to kind of flexibly scale up. So, if we need to store more data for a month we can do it, we don't have to buy anything new, we just rent more space for it. And then as our demand shrinks, we can return that space and stop paying for it. Typically, these are resource-through-data centres and many of the data centres are hosted by some of the big computing providers that we know in the world. And these data centres provide both the sort of hardware and the software stack so, that you provide the hardware on which we can store data - So, the disks and the computer power and the networking to move the data in and out - but they also, provide software that we can use for either directly for processing the data or on top of which we can build that processing power. And Cloud computing really has enabled this concept of Big Data.

More recently we've seen a move away from the central Cloud building on the realisation that a lot of the capacity and facilities we have in our kind of small devices are actually very powerful, so, we don't need to move everything to the Cloud. We can do some of that processing in the Edge or in the Fog. So, for example, on a mobile phone or on an end-user device or somewhere in the network on the way to the Cloud. And that's one of the areas that I have been very, very interested in for many years, is sort of shifting processing to the right place in the network, and we will see more of that in a moment we also, have service computing. And service computing has enabled a new way of building software systems. Essentially, we're looking at services that can be composed and put together in a just-in-time fashion, so, they can be put together at the runtime of the system to create the sort of functionality we want. And often that is driven through business processes or scientific workflows, where sort of an overarching structure exists where each task in that process is then fulfilled by a service that's acting on our data. And the great thing about bringing sort of the Cloud, Edge and Fog computing ideas in the service computing ideas together is that services are so, self-contained that they can relatively easily be moved from one system to another and can process data and can work on that data in exactly the right place where they're most efficient.

Another slide appears with the headline ‘Key concepts (2)’. To the right of the slide is a screenshot of an online article from The Independent which reads “Swedish workers implanted with microchips to replace cash, cards and ID passes”.

To the left of the screenshot are two subheadings, each with bullet points. These are as follows:

Internet of Things (IoT)/Cyber-Physical Systems

IoT, essentially sensors and actuators connected to the internet
Billions of them
Connected to all sorts of devices
SCADA/ICS taking it to a global level

Artificial Intelligence, Machine Learning, Semantically Enhanced Data

BigData, “the V’s”: veracity, velocity, volume and variety
Semantically enhanced data
Learning of patterns
Extracting information (data value chain)
Making automatic decisions on data/information
Risks of learning bad behaviour
Interesting applications, a need to combine data

[Stephan] The other things that have happened are obviously the Internet of Things or cyber-physical systems where we are trying to integrate entities in the real world into those computer networks, and that's usually done through electronic components called sensors and actuators. Sensors allow us to gather information from the environment and convert it into a digital format which can then be used to inform and to be processed, and then any decisions we have can be pushed out to actuators. And examples there, for example in an intelligent home, in a smart home could be that if the burglar alarm is sensing a movement in the house, then all the doors get locked automatically. So, the automatic door locks would be the actuators. The thing with the Internet of Things is we are scaling these ideas not to a single house or to a single entity, we are scaling them to a kind of global scale. And we are in a world where we already have billions of sensors, and the number of sensors is growing incredibly rapidly every year. And they're existing in all sorts of devices. So, I think probably a modern mobile phone will have 20, 30, or 40 sensors in it. Sensing anything from the movement to your GPS position, to its temperature, to the charging status of the battery, and that's just an average kind of standard customer device.

And finally, we have the sort of technologies that make sense of the data; that get new insights into the data, and that allows us to really work with that data, and they are artificial intelligence and machine learning, those ideas are making appearances quite regularly as well in common terminology these days. And I personally have an interest in what I might call Semantically Enhanced Data - So, data where we're not just looking at the value of that data, but where we're looking at the links of that data to other data items. And we will see an example of that later where we applied this in a very practical financial context.

In terms of Big Data, I think the key ideas to take away from here are that Big Data is about a lot of data, and that data is typically very varied. So, it can range from images to simple temperature readings, it can be driven at volume, so, many of these data items are collected very rapidly, very often, so, there are huge amounts of them. The more complex data is maybe collected less often, but it's just so, big based on its value, for example, the human genome or other data like that. And the interesting applications that we see, all of those kinds of art world applications that I was speaking about earlier, and will continue to talk about, are all about the need to combine the various data items to make the most of the opportunity.

We see a new slide with the title “Design-time, Run-time and Hybrid Approaches (1)”. The slide shows a number of bullet points with a cartoon image of a telephone displayed above. Next to the image are two, dark blue speech bubbles with text that reads:

Call Waiting
Call forwarding

To the right of these speech bubbles is a text box which reads:

Call waiting: play an incoming call sound. Call forwarding: redirect call to the programmed number. What should the system do?

The bullet points are listed below this and are as follows:

Feature interactions in telephone systems, policy conflict
Note: the basic system is correct, and each feature is correct, the problem arises when the features are used together
Design-time: try to avoid any conflicts by identifying all possible run-time behaviours and adjusting the system (re-program)
Run-time: manage emerging behaviours when they are encountered
Hybrid: does a bit of both. Learn from design-time to support run-time decision making

And at this point, I do like to kind of take a step a few years back and reflect on something that I did in my PhD. So, in my PhD, I looked at a problem called feature interactions and telephone systems. And then in my postdoc, I looked at this as the concept of policy conflict. And the idea is relatively simple: you have a telephone system that's working perfectly fine, you're adding new functionality to it called features, and typical features in those days were call-waiting, where you would get a notification if another call was coming in while you're busy, or the feature of call-forwarding, which would redirect the call when you're busy to another number.

Now, the basic system exists, each feature in it is programmed and written and tested in isolation, and they work, so, there's nothing wrong with what's being programmed. But then at run time when the telephone system is up and running and the user has made some decisions which feature, they have on their telephone line, we can suddenly find problems occurring. And the problem with call-waiting and call-forwarding is, essentially, a call comes in, you’re busy, should you hear the call rating tone, or should the call be forwarded to another person, and the system does need to make a decision on this.

Now typically, approaches to resolve these issues were what we would call design-time approaches, so, you would try to test the system; you would try to test all the combinations of features on the system, and you would then make a decision that these two things cannot be possibly used together. So, you do need to arbitrate, and you need to build a solution into the system.

In my work, we started to look at what I would call runtime approaches, some people would talk about online approaches, where we would say, "Well, fine, let's have these things on the system, let's monitor for that emerging behaviour and then start to manage it." So, if the call comes in then let's make a runtime decision on that phone where that problem occurs. And that is useful because if you wanted to try to test or check all possible combinations of all possible features, you're spending a vast amount of time on checking things that in reality probably never encounter each other in any real system because it's different users using different features. And as long as they don't make a phone call to each other, their features have no opportunity to conflict. So, it makes sense to deal with this emerging behaviour at runtime. And I think that's nowadays reflected in many of the systems such as self-driving vehicles, where again you can't possibly program or pre-plan for all eventualities that can happen on the road. I think every driver will know this - there will always be situations where you do need to make a decision in the moment.

There will be new scenarios you are entering, you couldn't plan for them, but you manage, and you make a decision on that behaviour as and when you encounter it. Now probably the right approach is to combine these a little bit and use as much design time approaches to keep things as safe as we can but then manage and support the runtime decisions for these emerging behaviours using that insight that we have from the design time. Now that's obviously quite some time back when I looked at these feature interaction problems…

The slide changes. This time, the slide displays a new set of bullet points with the title “Design-time, Run-time and Hybrid Approaches (2)”. To the right of the slide, there is an image depicting a busy suspended motorway in the centre of a city. The motorway sits above another equally busy road which lies beneath it, and the traffic appears to be at a standstill. The bullet points are placed to the left of the image and read:

Services and Workflows

Dynamic system where components are brought together at runtime
Computational challenges
Complete design, automatic planning, dynamic instantiation?

Data processing and Analytics

Policymaking VS reacting in the now (E.g. Infrastructure, building and traffic routing)
Individual VS demographic (E.g. precision medicine)
Processing on edge VS in the cloud (E.g. smart car insurance)

[Stephan] But, looking at them in current scenarios, they do re-occur, and it's in services and workflows I have seen them where we are looking at, again, a dynamic system where components are being brought together as that system is executing. And now in the kind of data processing and analytics world we see this even more.

So, I already hinted at traffic routing earlier and it is about the decision between policymaking. So, trying to understand the problem, trying to understand what is or isn't working in our cities and then re-planning and restructuring the street network in them. That's a very long-term sort of plan and an approach that obviously our city councils and so, on are engaged with versus the sort of runtime solution of putting the data that we see in the now. So, if we nowadays look at some of the routing apps, we have on our phones they will tell us about traffic jams ahead of us and they will tell us of a different route. Now at the moment, that's quite selfish because they only look at the picture as it is and they tell us individually how we should be changing our routes, but if you start to take this into a more kind of bigger-picture approach where different cars can start to exchange information and plan together on how they are overcoming these problems, then we're looking at a very different scenario. And the sort of architectures that we are developing support those, some reacting-in-the-now kind of decisions.

A related scenario, but different in some senses, is in the area of precision medicine, where we might talk about medical solutions. So, if for example, you have a patient who's at risk of heart attack or diabetic shock then their mobile phone could monitor their behaviour as they carry it throughout the day, and it can for that individual make decisions. So, if it notices that the individual needs to move more now to avoid, I don't know, a heart attack in half an hour, then it can give the user that advice. The user starts moving, the risk goes down, and everything is good. Or the device notices that the user doesn't move and then might pre-emptively call an ambulance so, that they are in place when needed. But the problem we have here is that we do need to combine individual data, so, we need to look at the individual and their kind of situation and combine that with a sort of demographic understanding that we have around that group of people or around a certain illness and how that behaves.

And in precision medicine, we need to bring these together. But again, I think we need to bring these insights together not in terms of the sort of policymaking, in terms of what tablets we might prescribe, but really into the now, where we are looking at giving immediate advice. And I guess if you look at the track-and-trace apps that we have at the moment due to Covid, then they are a bit of that demographic solution. So, they are looking at tracking and tracing behaviour and then retrospectively telling people that they were exposed to a certain risk. It would be much greater if we could tell people immediately as they are out and about that they are entering a zone where the risk for them is higher and that warns them off from going further.

And we can find then other examples there, the third kind of example I do want to look at is an example of processing the data on the Edge which is in the Cloud. So, if we look for example at smart car insurances at the moment, there are car insurances where you have a tracking device in the car that is reporting back to the insurance company on the weather condition, the driving conditions, road conditions, on your driving speed and on all sorts of behaviours that you have. Are you breaking and accelerating very rapidly? So, are you driving quite aggressively or not? And it's reporting all of this data to the insurance company. So, you could look back at that privacy issue that I mentioned earlier because that is quite privacy-invading, and the insurance company then adjusts the rate to be paid on the policy.

Now that's collecting the data and processing it in the Cloud, but I think what we also, might be able to do is to process that data in the car because the only thing the insurance company really needs to know is whether the risk of your driving has gone up or down in a given time. So, if the car reports risk is increasing or risk is decreasing then that's sufficient for the insurance company. And in some sense, they don't need to know why this is increasing. Maybe it is because you're driving aggressively, maybe it is because you just hit a spot of really bad weather on the road, but those details could stay with you in the car, and I think our privacy would be much better.

We switch slides to one titled “Data Processing Architectures”. Below the title sits a flow chart made up of dark blue and black icons. At the start of the flow chart, we see an image of a satellite in orbit, we then move to an image of a human head, to an image of a server, to another image of a human head then back to the server. Below this chart sits another, this chart shows an image of a cloud, we then move to an image of data entering a funnel before finally coming to an image of a computer next to a heavy-duty robotic arm.

Below this diagram is a number of subheadings, each with its own bullet points below. These are as follows:

Sensing and Acting

Response time,
Frequency detail
Reliability

Processing

Filtering, forwarding
Analytics

Storage

Formats and structure
What to store? And, for how long?

Analytics

AI and ML

[Stephan] So, if we then take these ideas and put them into what I term Data Processing Architectures then we're starting to kind of get a picture of what's going on.

So, in these Architectures, we have sort of a zone of sensing and acting, which is our interface with the real world, with the machines, equipment, sensors, gathering data and providing insight on the world around us, but also, devices that can change things.

So, for example, in the agricultural thing the sensor could be taking pictures of the ground and analysing them to understand certain fertiliser concentrations and then could drive a machine to release more or less fertiliser as the actuator. Now that data from the sensor is gathered it needs processing, filtering, and forwarding. And that's sort of the Edge and Fog zone where we are looking at applying some of the simpler techniques. And the shortest loop that I mentioned earlier would be sensing-processing-acting, and there are many applications that are of huge interest today that need this cycle. Whereas many applications that are written today from a computing sense are looking at a much longer cycle; they're looking at sensing, forwarding to the Cloud, storing, then applying some AI and then maybe eventually going back out to some act, or processing results in the real world. But we can take any sort of loop through this and what we find is that if we look at the areas here, we see many of the technologies that we've spoken about earlier emerge.

So, we see services in IoT, they can be deployed to these devices at the Edge, they can look at really interacting with this real world and the service gives us that potential to move program functionality right to the sensor. And we have done this in applications for fast data processing, where we have pushed some of that decision-making right to the Edge. Once we start to get more into the middle, we might look at doing some AI, So, the orange puzzle type, some AI and software services on our Edge devices, on our Fog devices, and actually get insights. Maybe not the perfect insights with all the data that we could possibly have, but insights built around the data we gathered recently or gathered just now, and act on it very, very fast. Otherwise, we go into the cloud and then obviously we can do other things there, but we do need to understand that the further away from the sensor we go, or the longer that processing chain is, the longer the time delay is between sensing something and sending a result. And also, the higher the cost in terms of bandwidth and storage and gathering and managing that data becomes - bringing us back to some of the questions we've mentioned earlier.

And the focus of my work, and that's where we will see much more now in the next few slides has been in this sort of central zone around services for taking the data from the end devices and working with them, doing things in the Cloud in terms of workflows and processes and running things in the Cloud, but always with this view that the systems I'm looking at are interacting with them, sensing or acting capability, or interacting with AI techniques at the Cloud level. So, in that sense, if I could just reflect then on some of the questions that I have been addressing and that is really relevant for all of the scenarios we're looking at…

We change to another slide titled “Questions and solutions”. On the slide, two subheadings can be seen, each with its own list of bullet points below. Each bullet point is posed as a question and is followed by a proposed answer. The bullet points read as follows:

Selecting the Right Data and the Right Place to Process

Addresses bandwidth usage, speed of processing, data storage and privacy questions
Question 1: Where do we process the data? Solution: autonomous objects, hybrid edge/cloud solutions and scientific workflow processing nodes
Question 2: How can we filter the data? Solution: Fast data processing in combined pull-push approaches

Selecting the Right Service to Process Data

Addresses processing data questions
Question 1: How to achieve flexible workflows and instantiate activities with services? Solution: StPowla and in-context service selection
Question 2: How to plan complex services dynamically? Solution: HTN and bio-inspired planning

[Stephan] One of those is around selecting the right data and the right place to process it. So, how do we know which data we need to process and where are we doing that processing? And that answer to that question addresses our bandwidth usage, the speed of processing, data storage and also privacy, as concepts? And there we've done work around autonomous objects, where we are giving a lot of power to the Edge device to almost operate independently or interact with peer devices by need. We're looking at hybrid solutions where we are deciding that some processing can take place at the Edge or on that end device, but some do need to be shipped to the Cloud. And we have been looking at sort of structured processes, for example, in scientific workflows that tend to process genomic data sets.

For example, looking for drug candidates and eliminating those that are not hopeful so, they don't even make it to kind of from lab trials. And I think big successes of those sorts of technologies we've seen in the rapid pace at which the Covid medications were produced. So, those vaccines could be produced so, quickly because a lot of data processing can be applied to candidate drugs and eliminate them before even any lab trial needs to be undertaken. But, we also, need to look at how we can filter the data, so, we need to think about what data can we start to throw away at the point of the collection because it's not giving us any new insights? And I hope I will have some convincing examples of that in a moment as well.

And then, it's obviously a question around what software, what service do we use to process the data? And there again we are looking at selection questions. So, we are looking at flexibility in workflows, and understanding how we select the right services for a given problem, but we're also, looking at maybe automated approaches for planning complex services and breaking those down more automatically again at one time. So, if I take a few minutes to look at service selection, then…

We see a new slide. This slide is titled “Service Selection” and depicts two subheadings, each with bullet points. To the right of this, there is an image of the front cover of a book titled “Handbook of Research on Service-Oriented Systems and Non-Functional Properties: Future Directions” by Stephan Reiff-Marganiec and Marcel Tilly. The Book is a pale green colour and has a front cover image depicting a cloud surrounded by various computer icons such as music, video and documents, meant to represent cloud computing. The bullet points to the left of this image are as follows:

Two problems

Select a single service
Select services for a workflow

Other Considerations

Functional and non-functional properties
Selecting the best execution place for a service

[Stephan] In service selection, we are looking at the idea of picking the right software service for working with a set of data. And that can be either selecting a single service, so, if we have a simple task to do, we are looking at selecting a simple single service - if we have a complex task then often, we have a workflow or automatically planned workflow that chains a number of tasks together to drive a solution. And in that area, we then want to select the service for the whole workflow so that every task is obviously executed by a relevant service. And in that space, we then need to look at two decisions: So, for every service, we have a question on functionality and what we call non-functional properties, So, clearly the service needs to do what we want it to.

If we think about a transport service that's taking us from our home to the nearest train station, then the functionality is quite clear: it's a transport service. It takes you from one place to another. But we then also, have what we call non-functional properties, and they are around the sort of quality of that service. And the quality might be, one of the transport services might be, around cost - how expensive it is. It might be around the comfort we have. For example, a taxi is probably much more comfortable because it picks you up at your door and drops you off where you need to go, whereas the bus where you probably have to walk to a bus stop and wait and hope for one to come along. And as I said already, it's then also, - that's more relevant for software services - it's about the best place for executing a service, and that's driven by how much data do we need to ship around? Can we move the service to a different place? Do we best kind of distribute the service to many devices and have lots of instances running at the Edge? Or is it better for that service to run in a central place where all applications use it?

We see a new slide appear. This slide depicts a large flow chart, courtesy of InContext Solutions. The chart shows the movement of data through what is called the “Relevance Engine”. To the right of the chart sit two diagrams showing complex equations detailing the processes of the “Relevance Engine”. The slide is titled “Service Selection – Scoring”.

So, in the InContext EU project, that's one of the projects that Jose referred to earlier on, we were looking at selecting services for team-working. But they were meant to be selected to be most appropriate for the users. And in some sense, the architecture looked at gathering the context information, so, our sensing piece.

And then we built the kind of criteria and rule-based selection mechanism which had two phases: a filtering phase and a ranking phase. And in the filtering phase we're essentially sorting out all of the services that are just not right for what we need, and then in the ranking phase we're ordering the services by a sort of preference-scoring algorithm - So, an algorithm that tells us which service is best for what we need.

And those methods are fantastic for selecting a single service. So, if we need to pick a service then let's throw all of the service candidates that are not relevant away, and let's rank the remaining ones. So, if you want to pick an example: you want to go to dinner in a restaurant then, yes. All the restaurants that don't offer dinner you can immediately filter out in your vicinity because they are not relevant to you. And then the ranking: you might look at the star rating or the hygiene rating of the restaurant, or a combination of factors, maybe ranging from price and hygiene and so, on, to then understand which one might be best suited for your needs. And then the algorithm here would suggest the best one for you to go to.

We switch slides briefly to a slide titled “Service Selection – Global Vs Local” which shows a diagram. Arranged vertically along the left-hand side of the diagram can be seen three speech bubbles with the letters T1, T2 and T3 written in them, the numbers are arranged from T1 on the top to T3 at the bottom and directly right of these bubbles can be seen more bubbles with a number and a price in pound sterling next to each. They are listed as follows:

T1 – 11 (£2), 12 (£5)

T2 – 21 (£5 with s11 and £1 with s12), 22 (£8 with s11 and £2 with s12)

T3 – 31 (£3 with s21 and £2 with s22), 32 (£1 with s 21 and £5 with s22)

We then switch to a slide titled “Service Composition” which has three subheadings, two of which have bullet points underneath. They read as follows:

Goal: to find a combination of services that achieve a task

Static Workflows

Designed by a human
Can express many things (e.g. exception handling), workflows plus policies for runtime adaption (e.g. StPowla)

Dynamic

Find one solution or many
Need to handle changing environment
Planning techniques (HTN) or genetic algorithms

[Stephan] If we put this into a slightly larger context of selecting services as part of a workflow: So, our workflow here is sort of just three abstract tasks that need to be done to get us from our starting place to the end. And for each of the tasks, we have a number of services that we can choose.

And for simplicity, we'll only look at the cost of the service here. So, if we would need to select a service for task one, obviously we would pick this one because it's the cheapest. Great. So, let's move on. Pick it. But then as we come to the next service, we encounter something that's maybe a little unusual, and that's where our kind of planning and forward-thinking just need to come in in some way. Because here we see that the cost for the next service is dependent on the choice we made earlier. And, we can see that actually this service together with the one we've chosen will bring us to, in this case, £7.00. Whereas had we chosen this service, then maybe we would have had other options here: and we do, because this service together with this one would only cost us £6.00. So, in some sense, we are seeing a kind of a deal here where one service says, "If you use me together with something else, you're getting a discount." And we've all encountered this at times when we were still allowed to travel quite freely.

Car rental or hotels might be cheaper in conjunction with an airline ticket from a certain airline. So, if you fly with British Airways, maybe with Avis Car Rental you only pay half the price. So, the decision for chaining services together and choosing them depends on connections between those services.

Now you could say, "Well, that's fine, so, let's just plan the whole thing ahead. We know we have three tasks, let's think about all of the service candidates, let's run all of them and let's move on." But what we then find is that if we are in a dynamic world where services become available and unavailable, that no longer works. It no longer holds. So, we need to work around this. I'll just quickly skip through the rest of this. And in that sense, if we look at those compositions of services, we are looking at combinations of services that achieve a task, and we're looking at the workflows, but again we see that workflows can be a very dynamic thing that is changing at runtime. And that at some time we would like the services to be changed, the workflow might change based on some rule that we are applying. So, rather than being re-engineered, we want dynamic changes at runtime that then are embedded through to the services we select to run them.

The next slide is titled “No Latency Data Processing” and shows a diagram of a mediator on the left of the screen, illustrating three types of service entering on one side and four types of consumers entering on the other. Below the diagram are three text boxes which read:

A continuous stream of events triggered by policy obligations
Reasoning over event stream and mapping to consumer requests
Consumer request for given user context

To the left of the diagram are three bullet points which read:

Push vs pull
Bandwidth reduction
Fast replies

[Stephan] If we return our attention back to data processing, we looked at no latency data processing, so, we do want to look at bandwidth reduction and fast replies, and if we look at the scenario there, we can look at the scenario of taxis. So, for example, if we have users waiting for taxis then they want taxis that are in their vicinity or that get to them quickly and that are available. So, a typical system might collect all of this data from all of the taxis in a city and then map users to it as the user dynamically arrives with requests.

But in practice, we only need systems to store little bits of information because once this taxi comes to this point and becomes available again, that historic information is no longer needed, so, we can forget about it, we can throw it away and we just look at the new status of the taxi. And then if we look at that then, in this low latency architecture we are centralizing the decision-making into the Cloud, effectively, So, all the data gets gathered into one place.

But what we would very much like to do is make use of the computational power at the Edge where we are then looking at the autonomy of devices. So, each what we call a smart object or smart device can make its own decision, so, it has some agency. It can interact with other objects in its vicinity to make good decisions on which services it wants to offer and when it wants to offer them, with a view of maintaining an overall kind of great best-service capacity across the whole device or smart object ecosystem. And to do that a lot of the data needs to be gathered and processed on the devices, but that data tends to be relatively simple and most of the processing can be done on the device. And if things do really get very, very complicated, well, the device can always connect to the Cloud for those decisions and seek insights from there.

The slide changes to a new slide showing three images. The first is a screenshot of an excel document that details data that is relevant to a company’s spending and income. This same data is then shown in plain text in another image before finally, the process of interpreting this data is laid out in a flowchart in the final image.

The slide is titled “Looking at Data in New Ways” and features two bullet points just above the images at the top of the page. These bullet points read:

Semantic models of data: connections rather than structure
Financial data: fraud detection

[Stephan] The final aspect that I want to look at is looking at data in new ways. So, I mentioned earlier that the data obviously exists as the kind of values that we have, but data can also, exist as links, and one of the questions that arose was where we worked with an SME in the area of linked or semantic data. Where we were looking for financial fraud, so, people in HMRC and so, on obviously gather all the company records at the end of the year and they tend to be looking a bit like this: So, it's a bit like either a spreadsheet or a Word document or some other data structure that contains all of the information. But what it doesn't contain is the links between the various elements. So, you could be the director of seven different companies, and you would simply occur on seven different sheets, and no one would necessarily make the connections unless you know you're looking for a specific person, and you go and start exploring. So, what we have enabled is to automatically gather many of these sheets and process them into a semantic data structure where you then can start exploring.

So, you can look at, who is the owner of a business? Where do they live? Who else lives with them? and so on, to then see whether there are any suspicious links between various companies, and then obviously fraud investigators can work in a much more directed and determined way.

A new slide appears titled “Summary” and contains four subheadings, three of which have bullet points beneath them. They read as follows:

Much work has been done in many areas of computer science

My work contributes:

Scalable software architecture for data processing
Semantic web applications for data exploration
Service selection and composition
A user-focused view

Great opportunity awaits in both:

Combining existing ideas to address real problems
Developing and researching new bits that improve on the state of the art

Key challenges:

Ensuring data quality
AI and machine learning at the source
Managing a continued growth of available data and fast responses

[Stephan] OK. I think that sort of brings me to the end of the journey for today. So, clearly, much of my work has been done in the areas of computer science. Many, many other computer scientists are working, they're working with experts from all sorts of other areas, and a lot of work has already been done. But I think if we are starting to look at those new applications we also, have many open questions that we need to answer.

My work in particular contributes to scalable software architectures for data processing, semantic web applications for data exploration, service selection and composition, and all of this with the user and application-focused view. And I think if we are starting to look at the next steps forward, then I think there's a great opportunity in looking at combining already-existing ideas to address some of the problems we are facing because we already have so many solutions, they're just often existing in isolation. But then obviously we need to look at developing and researching the bits that improve on the state of the art, that give us the solutions to the places that we don't know or that give us better solutions to some of the techniques we currently have. And all of that needs to be seen in kind of a much, much bigger system in some sense, in that context of the applications.

I think key challenges we need to address around ensuring data quality - and I don't think we have a lot of fantastic work so far that kind of gives us those guarantees around data quality - I think we do need to look much more at pushing AI and machine learning to the source of the data, and at the moment many of the techniques are, and especially in recent years, whether the focus has been on deep learning, are not necessarily quite so flexible to be pushed out to small devices, and I think we have to learn to do something about that continuing growth of data - that is if we do want fast responses. If we do want to keep our bandwidth and storage use down then we do need to start to learn to throw some of our data away and just say, "Well thank you, we don't need this."

So, let me conclude by thanking you. And that 'you' are many, many people. It's all my collaborators, colleagues and PhD students over many years, some names are on the slide. And I did a recent search on DBLP, which is the computer science paper repository, and it shows 115 co-authors, so I would have run out of space on the slide if I had listed all of them! So, thank you very much to all that work with me, collaborators and colleagues and PhD students. It's always exciting working with people from all around the world and it's something I wouldn't want to miss, and it's to a large extent due to them putting me into the position that I am today.

Also, thanks to my family who at times had to do without me for weeks and months on end, when I was at a research project visit and research visits abroad, and I know that that obviously is very hard on them. I had the good luck of being funded by various organisations over the years, and they enabled some of the work that was done, and thank you to the audience today, so all of you colleagues and guests for joining today and for listening. Thank you very much.

We briefly see a slide with the title “Thank you” and lists a number of organisations and individuals which reads as follows:

Thank you to…

All my collaborators, colleagues and PhD students over many years
To my family
To funding bodies including the EU, InnovateUK and Nuffield
To the audience today for listening

We then see the University of Derby three hills logo appear in the centre of the screen on a dark blue background before fading to black.

Inaugural Lecture Series: Professor Stephan Reiff-Marganiec video

Back to Enabling a smart world through service computing and data processing architectures