Personalized Recommendations: Causal Models for Health & Lifestyle

November 17, 2023

Prof. Daniel Franks

Introduction

In the current digital age, every click, swipe, or tap we make paints an increasingly comprehensive portrait of our individual preferences, attitudes, and behaviours. This wealth of data is the foundation of personalized recommendations, algorithms that analyse and predict what each of us will want or need next. For some recommendations, it’s fine to just use standard machine learning to predict the outcome: whether it's the next movie on your Netflix list, or the next product on Amazon. But when you move into recommended interventions – where you’re asking someone to make a change to their lives – only causal machine learning is appropriate.

So, what's the role of causal inference in personalized recommendations, especially in health and lifestyle? And how can we leverage it to make better, more insightful decisions? Welcome to our exploration into this exciting intersection of data, technology, and health.


Personalised Health Recommendations: An Example

Kaggle is a platform where machine learning enthusiasts compete in what are typically couched as prediction problems. One example is where people are asked to compete to see who can best predict chronic kidney disease from 24 variables, which include data such as age, blood pressure, salt intake, diabetes status, and more (https://github.com/AP-Atul/Chronic-Kidney-Disease). The way a typical machine learning model works is to best predict the outcome (here kidney disease) from all inputs. But this approach should not be used to make recommendations for personalised interventions. For example, you cannot look at what happens when you intervene on the model and reduce salt by a given amount, which you’d need to do to make a recommendation about diet, for example.

Let’s have a look at this from a causal perspective, using just a subset of the variables, for illustrative purposes. First, let’s start with our outcome variable, kidney disease. Then we add blood pressure, which is known to be causally related to kidney disease.

There is already a problem here with non-causal approaches, because these two variables are confounded (i.e. they have a common cause). It’s not possible to understand the effect of a variable without dealing with confounders. Here’s a simple example. Age and diabetes are causes of blood pressure and kidney disease. This can be considered in a structural causal model.

However, the dataset does not give detail about when blood pressure was measured. Kidney disease can also cause higher blood pressure, so it’s important to distinguish, otherwise you’re using data from the future to predict the past, which doesn’t make sense for recommendations for personalised interventions. We’ll assume it’s before getting kidney disease here, but draw the full diagram to show that temporal structure can be considered.



Now let’s look at what lifestyle changes we might be able to make recommendations about. Salt levels is a simple one. We don’t really know if salt directly impacts kidney disease, so we’ll add an arrow to allow the possibility. But it’s known that salt level causes blood pressure level, so we add an arrow from salt to blood pressure.


This means that there are potentially two causal ways through which salt intake can causally impact the risk of kidney disease. One is a direct causal effect, and the other is an indirect causal effect through increasing blood pressure. This gives us this:



Traditional machine learning would miss this important causal structure and simply regress all variables onto the outcome. As a result, any recommendations about salt levels made by changing the value of salt in a traditional model would only give the direct effect of salt here. This is because the main causal pathway for salt to increase the chance of kidney disease comes through blood pressure. But traditional AI models would condition on blood pressure (fix it's value as input from the data) and so changing the salt level would not change blood pressure, and so the main causal route for salt to impact kidney disease would be blocked.

In fact, in these data so much appears to be mediated through blood pressure, and any traditional non-causal model would give incorrect recommendations, because it would block the main causal pathway for those recommended changes to have an effect. Using a traditional predictive model here would be harmful. But causal models allow causal arrows between all variables. This means that - here - we can use causal models to account for confounding and be sure to open indirect causal pathways.

Going forward, this model can be built up further, especially with the input of a domain expert. CausaDB can then simulate interventions by changing the values of variables to simulate individual-level interventions and find different sets of optimal changes that a person could make to their lifestyle.