Master’s Thesis Summary: Constructing a Narrative using Markov Decision Processes applied to Data Visualisations


We aimed to create a Markov Decision Process model that can recommend visualisations in an order that portrays a guided narrative. The model uses Monte Carlo learning and is trained on two consultants that have extensive knowledge of two clients each. These ‘knowledgeable consultants’ provided feedback on the quality of individual features and, most importantly, the transitions between features. This feedback is used to find the optimal sequence of visuals for each client. We tested the model on a dozen participants against both the knowledgeable consultant’s suggestion and a random sequence for each client. Participants provided open-ended, written commentary on each visual and three independent markers compared this against the insights that the knowledgeable consultants aimed to portray. We found the model enables participants to discover more insights if model training is performed well. Although the random sequence typically finds insights sooner, the model repeatedly provides insights followed by validation in a two-step pattern. This pattern distinguishes the model from the random sequence and, combined with the increased number of insights, demonstrates that it can provide a guided narrative. However, further work needs to be undertaken before the model is appropriate for direct client applications.

Keywords: Monte Carlo, Recommendation Systems, Narrative Visualisation, Consultancy, Markov Decision Processes.




Recommendation systems for information visualisation have only recently been explored. So far, the research only recommends individual visualisations to find interesting changes in the data. Similarly, there has been little research to find optimal methods for portraying a narrative using data visualisations. However, elsewhere there have been extensive studies in both of these fields which can be applied to data. For example, the film industry not only has advanced recommendation systems built through online streaming sites, such as Netflix (Carlos & Hunt 2015), but also has extensive knowledge in the design process to build a narrative and describe a story, such as using the ‘three-act structure’ (Wikipedia 2017). By applying research in other industries, as well as what has been covered in information visualisation recommendations and narrative in data visualisations, we aim to build a recommendation system model that achieves both; recommending visualisations in an order that portrays a narrative.

To build a model that recommends visualisations, both on individual quality and in an order, we looked at research in music playlist recommendations. A Monte Carlo model, named DJ-MC (Liebman, Saar- Tsechansky & Stone 2015), collects a single feedback for each song from users and is able to then use this to recommend the sequence of songs that best meets this person’s preferences. We attempt the same for data visualisations; first building a model to interact directly with clients that uses aesthetic features then building a second model that is trained on consultants who understand the data to then produce a sequence that portrays a specific narrative to clients.

Our aesthetic model, as laid out in our project proposal, attempts to create a sequence that provides clients with the information required, in an order that pleases the reader and runs in real time. However, in this model we assumed that the aesthetic features used would be enough to imply a narrative and recognised that this may be invalid. Therefore, we built a second model with one aim; recommend data visualisations to portray a narrative.

The design of our algorithms meant that to adapt our model, all we needed to do was change the method for obtaining a reward signal and the features used to describe the visualisations, not the calculations thereafter. In the aesthetic model, we had used features such as complexity and chart type. In the final narrative model, we simplify the process by only considering the information from the data shown, e.g. total premium or number of members in the scheme. To simplify and improve the training process even further, we only showed transitions of simple bar charts displayed in juxtaposition (side by side) and used this feedback to map features to graph that contained two features in superposition (bar with line chart overlay). This was performed to improve the model output and was only possible due to our simplified descriptions of the graphs.

The final narrative model was trained on two knowledgeable consultants. Each knowledgeable consultant provided: feedback for two clients, their choice of simple visuals and, the narrative they were attempting to portray. The model was trained by asking the knowledgeable consultants to review a number of transitions in juxtaposition between two bar charts showing one feature each. This feedback was used to calculate a reward signal that captures the quality of each feature and the transition between features to portray a narrative for each client. This reward signal for the simple bar charts was then used to calculate the reward signal for the more complex superposition charts showing two features for each client. These reward signals were passed through our Monte Carlo model to produce the best sequence of visualisations to portray the narrative required for each client.

To assess whether these sequences had produced the narrative required, we created open-feedback surveys for participants in the firm Punter Southall Health & Protection. For the 4 sample clients, the surveys provided each participant with a varied combination of: model’s output for two clients, the knowledgeable consultant’s suggestion of the simple bar charts for one client and a randomly produced sequence of superposition visuals for the last client. Across 10 participants we were able to perform both between-group and within-subjects design in our analysis. In other words, we compared the quality of the model across the participants for each client and for each person we compare which one they found most successful. The participants ranged from other consultants to new joiners who have little experience in the industry. The responses were analysed by three independent markers and following a discussion on Friday 22nd December, individual cases were agreed where appropriate.

To measure success, we set three hypotheses: 1) the model will allow participants to find more insights than the knowledgeable consultant’s and random sequence, 2) the model will find the majority of its insights sooner than the random sequence and, 3) the model will cause participants to make fewer incorrect guesses about the underlying trends of the data. The first allows us to test whether the individual visuals provide more insight, the second is to measure whether the order is indeed important and the third is to make sure we are not providing an incorrect narrative with the model.

The results are discussed in detail and, in summary, show that not only can the model find more insights, but also provides insights in a pattern different from the random sequence. In most cases the random sequence provides all insights in the first two slides whereas the model will alternate between insights and slides that validate information.

In addition to reviewing literature that supported the methodology of this study, we have also systematically tracked the challenges faced at each stage, the decision made and the justification for the decision. This includes our move from aesthetic to narrative models, comparison of our final aims from our original proposal and the testing procedure’s ability to validate our hypotheses.

The overall aim is to create a Markov Decision Process recommendation system for suggesting data visualisations in an order that can portray a narrative that is clearly distinguished from randomly selecting graphs. This is not only a development in recommendations for data visualisations and narrative literature in information visualisation but also a major advancement in the industry in which it has been applied. We hope that this enables the firm to have a competitive edge compared to other brokers that use lengthy, written reports. Ultimately, the results may be used to create dashboards to enable clients to find insights interactively that are both in line with the consultant’s aims and are also valid.


Summary of Results

The results show that our model not only found more insights for client 1 and 4 than the knowledgeable consultant and random sequences but also found them sooner. However, although all three found a similar amount of insights for client 2, the random sequence found them much sooner than the model suggesting the model is in fact not placing all the insights in the first 1 or 2 slides. As expected from our earlier comments on the feedback the model performed worst for client 3 and unusually the random sequence typically found one insight per slide for this client. We are unable to conclude whether the model finds fewer incorrect conclusions as there was not enough information across the sample of participants. Instead, we have found that many of the comments focused on ‘validation’ rather than guessing trends.






We had attempted to show with our hypotheses that the model finds more insights and is distinguishable in some way to the random sequence in how people find insights. With this in mind, we posed our first hypothesis for the number of insights and the second hypothesis to show that the model would enable people to find information sooner. We were able to demonstrate that the model finds more insights than both the knowledgeable consultant and random sequence for clients 1 and 4 and performs expectantly worse for client 3, proving H1.

We found that, in fact, the model does not provide insights sooner than the random sequence and this is particularly true for client 2 where the number of insights found are equal across all sequences, disproving H2. However, in hindsight, we would pose our second hypothesis around those tested validating information as a way to show that the model has a pattern that is distinguished from simply randomly providing visuals.

The pattern of alternating between insights and validations observed in the model makes more sense when we consider that the core function of Monte Carlo is to find the sequence with the best long-term reward across all steps. The random sequences best replicate a ‘greedy’ selection of visuals where we would simply select the order based on individual quality and ignore transitions. We have therefore demonstrated that, although H2 is disproved, the model is different from the random sequence and, furthermore, can find more insights in some cases.

Client 3 demonstrates how important the quality of feedback is in training. We noted in this case the knowledgeable consultant provided heavily negative feedback and, in doing so, made it challenging for the model to find the best visuals and transitions. The model therefore underperformed, and, at the same time, the random sequence unusually provided insights consistently across all slides causing it to greatly outperform the model. Although this appears negative, it does confirm that the choice of visuals is indeed important. There was a risk that, when we provided 6 graphs, each with 2 features on, the participants would find most insights simply by the breadth of information shown across the sequence. The fact that the model performs notably worse in client 3 proves the importance of the choice of visuals, order or both.

We were unable to conclusively demonstrate that the model does not cause participants to make incorrect conclusions. Some participants were more prone to making guesses and, in these cases, were more likely to make incorrect guesses with the other sequences. However, the lack of guesses made into the underlying causes of trends across the sample population means we do not feel it is clear whether this would always be the case. Therefore, we are unable to prove or disprove H3.

We have demonstrated that the model can find more insights if the mode training is performed well, proving our first hypothesis. Although we disproved our second hypothesis, we were still able to show that the model finds insights in a pattern that is clearly distinguished from the random sequence.

Extensive work needs to be performed to ensure that the sequences produced are indeed positive as we were unable to prove that participants would not find incorrect conclusions.

Therefore, we can conclude that this research has succeeded in creating a model that has a pattern of insights found that is different from the random sequence, but we have yet to clearly prove that this means the model portrays a narrative.

The pattern produced by the model that validates information following insights can be considered in line with the guidelines on proving an argument in essay writing where one of the key points is; ‘answer the question posed with evidence’ (Leeds University Library 2015). In our case, the model repeatedly shows insight then further proves this to be true; essentially validating that the trends found on the first slide of this two-step pattern are correct.

We consider each insight to be a piece of the overall story for the client. Therefore, finding more insights and in such a way that the insights are also repeatedly proved to be correct demonstrates that the model has succeeded in providing a narrative.

Leave a Reply