Augmented reality-based remote family visits in nursing homes

.


INTRODUCTION
Loneliness is one of the most predominant feelings among residents of nursing homes and is associated with negative health outcomes and reduced quality of life [1].Visits by family members can typically reduce the residents' feelings of loneliness [1].The importance of these visits became painfully clear during the COVID-19 pandemic, when visitation restrictions resulted in increased feelings of loneliness and social disconnection among residents and their family members [2].But even before the pandemic, family members often could not visit their relatives living in nursing homes as often as they would like, due to a lack of time and long travel distances [3].Next to in-person visits, another way for nursing home residents to maintain contact with family and friends is via mediated communication, like phone-or video calling.While this may be a practical way to stay in touch [4], it is often experienced as less fulfilling than face-to-face interactions [3].A main reason for this is that using current audio-visual communication tools often does not allow for experiencing true feelings of social presence or togetherness, since these tools do not reliably convey essential social and spatial cues [3,5].Augmented reality (AR)-based and virtual-reality (VR)-based communication systems may alleviate this problem, by providing a more complete and immersive representation of the social and contextual cues of the communication setting [6].Thus, AR-based communication may enhance the quality of the mediated contact between residents and their remote family members, which may ultimately positively affect the residents' mental health [7,8].
The independent research organization TNO (www.tno.nl) recently developed a new AR-based communication tool, based on previous work on social VR [9].The goal of the current study was to investigate to what extent this tool can facilitate social contact between remote family members and their relatives in nursing homes.First, we conducted focus group sessions with residents, family members, and caretakers, to assess their communication needs and wishes.Based on the outcomes of these sessions, we derived three indicators for the user experience (UX) of mediated (AR or video calling) communication: enjoyment, spatial presence ('being there' [10,11]) and social presence ('being together with an intelligent other' [6]).Next, we conducted an experiment with a two-group (AR communication vs video calling) within-subjects design to compare the UX of AR-communication to that of video calling.The sample consisted of 10 participant pairs of residents and family members.In the video calling condition, the resident saw the remote family member projected on a flat (2D) screen.In the AR-communication condition, a 3D representation of the remote family member was projected onto a chair placed in front of the resident (creating the impression that the family member was actually present in the room).In both conditions, the family member saw the resident in the video calling mode on a 2D screen.We hypothesized that the residents would evaluate the AR-communication condition more positively than video calling, while family members would not evaluate both conditions differently.

RELATED WORK
The UX of immersive communication systems has been extensively investigated, focusing on the technology's capability to promote the senses of spatial and social presence.Other factors that determine UX of these systems are the naturalness of interaction, conversational engagement, comfortableness of the environment and enjoyment [12].However, such research has been mainly focused on VR technology, while much less is known about AR-based systems [13].Since VR and AR affordances are fundamentally different [14], it is unclear whether (social) interaction measures in VR apply to AR. Augmented reality systems supplement the real world with 3D-registered virtual objects [13] and do not replace the user's surroundings as VR systems would do [8].This leads to different conceptualizations of immersion and spatial presence.For VR, these concepts refer to the medium's capability to deliver an illusion of reality of the virtual world and the user's perception of physically 'being there' [10,11].For AR however, they refer to the plausibility and consistency of the virtual content in the user's own surroundings, such that virtual objects seem part of the physical world [15,16].While UX assessment techniques developed for VR can probably be adapted to AR, there is currently no standard evaluation methodology to assess UX of AR-mediated social communication [8].In this study we therefore used an evaluation methodology based on indicators derived from focus group sessions.

METHOD 3.1 Participants and procedure
Ten pairs of participants were included in our study, with each pair consisting of one resident (aged between 80 and 95 years) and one family member (aged between 47 and 68 years).To mitigate potential novelty effects, the AR system was introduced to the residents before the experiment.Each pair participated in both conditions (i.e., AR and video calling) in counterbalanced order within a single testing day.Family members were informed of the resident's communication condition.We evaluated the UX of the participants directly after each session.For each participant, we measured their perceived enjoyment, spatial presence and social presence, attitudes, behaviors and conversation duration.The UX of the residents was evaluated in a semistructured interview and by behavioral observations by a caretaker familiar with the resident, and the UX of the family members was assessed using questionnaires.For evaluation of the UX indicators of enjoyment, spatial presence and social presence, we asked the residents and family members to grade their experiences on a scale from 0 to 10.Additionally, the family members filled in a Dutch adaptation of the Networked Minds Questionnaire (NMQ) [17].To investigate attitudes, we asked the residents if they were interested to use the technology again in the future.For the family members, questionnaire items were included about the technology's suitability for repeated use for the residents and for themselves.Caretakers scored the residents' behavior using an adapted Dutch version of the Music in Dementia Assessment Scales (MiDAS) [18], which was accommodated to examine the well-being of a person during remote family visits.We also recorded the duration of each session, with a maximum of 20 minutes.This experimental protocol was approved by the local ethical review board and written informed consent was obtained from all participants.

Technical setup
The resident was seated in front of a 12.9" iPad Pro tablet, while being recorded by a Logitech webcam that was attached on top of the iPad (Figure 1).The family member was displayed on the iPad, which supported AR placement and rendering of 3D volumetric objects.The iPad was mounted vertically on a stand, positioned in between the resident's chair and an empty chair (Figure 2, left).On the other side, the family member was seated in front of a 43" TV screen on which the resident was displayed (Figure 1 and 2, right), while being registered by a color-plus-depth (RGB-D) camera, the Azure Kinect, that was positioned in front of the TV.Two computers were used to connect the devices on both sides via the internet (Figure 1).Microsoft Teams was used to deliver the video images from the resident to the family member and to transmit the audio signals in both directions.Audio was recorded and presented using Jabra 750 speakerphones to ensure good audio quality.A dedicated local area network was used for video and audio data transmission, ensuring a high-performance and uninterrupted connection.In the video calling (Microsoft Teams) condition, the resident was filmed by the iPad front camera while the family member was presented in 2D on the iPad (Figure 3, right).In the AR-communication condition, the family member was projected through the iPad in 3D onto a chair in front of the resident, such that it seemed that the family member was physically sitting opposite the resident (Figure 3, left).In both conditions, the resident was displayed in 2D video to the family member.This means that the difference between both  conditions was only visible for the resident, while the family member always saw a 2D image of the resident.The communication mode was only manipulated on the side of the resident, because the AR tool was primarily intended for their use.Everything else was held constant between conditions.The same screens were used, and the audio signals were similar since the same Jabra 750 speakerphones and Microsoft Teams connections were used for audio transmission.This offered good lip synchronization in both conditions, which was confirmed by expert opinions.The user's self-view in Microsoft Teams was disabled during video calling, since the AR tool also lacked a self-view.

Statistical analysis
IBM SPSS Statistics 26 (www.ibm.com) for MacOS was used to perform all statistical analyses.A significance level of 0.05 was used for all hypothesis testing.The normality of the data was assessed with Shapiro-Wilk tests.Within-group differences were tested using paired samples t-tests for normally distributed data and Wilcoxon signed rank tests for skewed data.Between-group differences were tested using independent samples t-tests for normally distributed data and Mann-Whitney U tests for skewed data.
One-way repeated-measures analysis of variance (ANOVA) were conducted due to the within-subjects design of this study.We tested for interaction effects by adding the variable of interest to the model as a between-subjects factor.For UX indicator scores, we used twoway repeated measures ANOVA to test the interaction of group and condition.

UX indicators
There were no significant within-group or between-group differences for any UX indicator scores and neither did we find interaction effects of group and condition (Table 1).The difference in spatial presence for family members was only marginally significant (z = -1.90,p = 0.06), with higher scores for the AR-communication condition compared to video calling.For the family members' responses on the NMQ, we did not find significant differences between conditions either, for total score and subscales (Table 2).The difference in perceived message understanding was only marginally significant (z = -1.91,p = 0.06), with  Note.Data are mean ± SD.For the subscales, total scores are divided by the number of items belonging to that scale due to the number of items per subscale being uneven.All subscales included six items, except perceived message understanding, which included four.a Nonparametric tests were used due to skewed data.
higher scores for the AR-communication condition compared to video calling.

Attitudes
For both the AR tool and video calling, 70% of the residents indicated that they were interested in using the technology again.90% of the family members found repeated use of the AR tool suitable both  for the residents and for themselves.For video calling, this fraction was 70%.

Behavior
For the behavioral observation scores, there were no significant differences between conditions for all items of the MiDAS (Table 3).

DISCUSSION
In this study we compared the UX of communication between nursing home residents and their family members via a newly developed AR-based tool to that of video calling.Family members reported slightly higher levels of spatial presence and perceived message understanding for AR-communication compared to video calling.This is surprising, given the fact that family members perceived the same (2D) image of their relative.This finding may reflect either the family members' positive expectations (since they were aware of the condition) or subtle changes in the behavior of the residents in the AR condition.Family members preferred the AR tool over video calling for future use, while residents had a positive attitude towards both technologies.The average session duration for AR-communication was significantly longer than for video calling.Contrary to our expectations, we did not find differences between AR-communication and video calling for the residents' UX.Only family members reported slight differences in their UX between both conditions.This may have been due to the residents being less able to verbalize their UX.The difficulty in understanding of and responding to questionnaire items has previously been identified in presence research [19], specifically among nursing home residents in the context of video calling [4].In this study, it became apparent that some of the residents found it difficult to articulate their experiences in grades.Some residents, for example, attributed the same score to all UX indicators for both conditions, even when their verbal expressions indicated otherwise, while again others tended to give socially desirable scores.Ultimately, the residents reported enjoying the sessions and experiencing feelings of togetherness and connectedness.It appears that residents did not (consciously) experience differences in UX between conditions.The residents may have been too focused on their family members to notice these differences.In contrast, family members perceived slightly higher spatial presence and message understanding in the AR condition.

Strengths and limitations
The main limitation of this study (in terms of generalizability and power) was its smaller sample size.Inclusion of participants was restricted because of the COVID-19 pandemic.Future studies should further investigate spatial presence and perceived message understanding during AR-based remote family visits, as these findings were only marginally significant in our study.
A limitation of the AR system was the visibility of the edges of the screen, which may have prevented a convincing seamless experience.For future development, the option of removing visible screen edges should be explored.Also, this study only involved a one-time use of the AR solution.A suggestion for future research is to study long(er) term use.This will allow for gaining deeper insight into the actual suitability of the AR solution for remote family visits in nursing homes.For example, this can involve studying effects of repeated use of the AR solution on the residents' (mental) well-being and loneliness.
Lastly, as mentioned above, the evaluation methodology to assess the UX of the residents may need to be tailored to this population to be able to assess their experiences in more detail.Indicators pertaining to duration, frequency and consistency of use may be useful in this context.

CONCLUSION
In this study, we compared the UX of a new AR-based communication tool with video calling for social interaction between residents in nursing homes and their remote family members.For residents, we found no differences in their UX with both communication modes.Family members reported marginally higher spatial presence in the AR-communication condition compared to video calling.Also, the average duration of AR-communication sessions was longer compared to video calling.In this study, we set out to determine whether AR-based communication would be overall advantageous over video calling.In this regard, the evidence we presented here, unfortunately, remains inconclusive.Nevertheless, it should be noted that the results do suggest that AR-based communication can be considered a viable alternative to video calling, for maintaining contact between residents of nursing homes and their friends and family.

Figure 1 :
Figure1: Schematic overview of the AR-based communication system.The resident (left) viewed the family member as a 3D projection in AR on the 12.9" iPad Pro and was recorded by a Logitech webcam.The family member (right) viewed the resident as a 2D presentation on a 43" TV screen and was recorded by a Kinect camera.Two computers were used to connect the devices on both sides via the internet.

Figure 2 :
Figure 2: Rooms and set up of the experiment.On the resident's side (left): a = chair onto which the family member would be projected; b = Logitech webcam; c = iPad; d = Jabra 750 speakerphone; e = position of the resident.On the family member's side (right): f = Azure Kinect camera; g = TV screen; h = Jabra 750 speakerphone; i = position of the family member.

Figure 3 :
Figure 3: Examples of the AR (left) and video calling (right) conditions on the resident's side, with close-ups of the iPad screen.

Figure 4 :
Figure 4: Mean session duration for the AR-communication (left, dark blue) and video calling (right, light blue) conditions.*p < 0.05.

Table 1 :
Mean scores of the UX indicators Data are mean ± SD. a Nonparametric tests were used due to skewed data.

Table 2 :
Networked Minds Questionnaire scores

Table 3 :
Behavioral observation scores on the MiDAS Note.Data are mean ± SD. a Nonparametric tests were used due to skewed data.