//Logo Image
Author: Tzung-Cheng Tsai (2007-06-18); recommendation: Yeh-Liang Hsu (2007-06-18).
Note: This article is Chapter 2 of Tzung-Cheng Tsai’s PhD thesis “Developing a telepresence robot for interpersonal communication with the elderly in a home environment.”

Chapter 2. Design elements in telepresence systems

2.1   Design elements for telepresence literatures

This chapter surveys the application-oriented telepresence literature which describes the development of a telepresence system. The design elements emphasized in these studies are extracted and summarized in Table 2-1. A discussion of these design elements as they fit into the framework of projection-immersion and observer-dialogist illustrated in Chapter 1 is given below.

Table 2-1. Design elements and related technological keywords for telepresence

Design elements

Related technological keywords

Data transmission

RF and Internet transmission, time-delay improved algorithm


simultaneous operation, robotic design


dexterous mechanism

Anthropomorphic elements

humanoid mechanism and expression

Stereoscopic elements

binocular and panoramic vision, image processing

Stereophonic elements

head-related transfer function, stereo audio

Eye contact

camera and screen with specific placement

Autonomous behaviors

environmental map establishment, self-maintenance capability

(1)   Data transmission

Data transmission, the transmission of control commands and sensory feedback, is a basic design element for the connection between the user and the remote telepresence robot or system. Wireless radio frequency and Internet are used in most telepresence applications, and dedicated lines are used in specific applications (such as operation in space and deep sea).

From the user’s view, timing of data transmission is important. Time delays would degrade the telepresence performance in both projection and immersion of the user. From the participant’s view, the time delays also affect the participant’s impression as an observer and interactive capability as a dialogist. Therefore, past telepresence research in data transmission focused on the development of a control scheme to deal with time delays for promoting performance [Tzafestas & Prokopiou, 1997; Daniel & McAree, 1998].

(2)   Teleoperation

Many studies in telepresence emphasize on enabling the user to modify the remote environment [Stoker et al., 1995; Engelberger, 2001; Spudis, 2001], that is, projecting the user to the teleoperator. A teleoperator is a machine that extends the user’s sensing and/or manipulating capability to a location remote from that user. Teleoperation refers to direct and continuous human control of the teleoperator.

The “Full-Immersion Telepresence Testbed (FITT)” developed by NASA, which combines a wearable interface integrating human perception, cognition and eye-hand coordination skills with a robot’s physical abilities, as shown in Figure 2-1, is a recent example of research in teleoperation [Rehnmark et al., 2005]. The teleoperated master-slave system “Robonaut” allows an intuitive, one-to-one mapping between master and slave motions. The operator uses the FITT wearable interface to remotely control the Robonuat to follow the operator’s motion fully in simultaneous operation to perform complex tasks in the international space station.

Figure 2-1. Full-Immersion Telepresence Testbed (FITT) and Robonaut [Rehnmark et al., 2005]

(3)   Supersensory

Supersensory refers to an advanced capability to modify the remote environment provided by a dexterous robot or a precise telepresence system. From the user’s view, the user’s manipulative efficiency for special tasks is enhanced when projecting onto a telepresence robot with supersensory. Green et al. [1995] developed a telepresence surgery system integrating vision, hearing and manipulation. It consists of two main modules, as shown in Figure 2-2, a surgeon’s console and a remote surgical unit located at the surgical table.  The remote unit provides scaled motion, force reflection and minimized friction for the surgeon to carry out complex tasks with quick, precise motions. Satava [1999], Schurr et al., [2000], Ballantyne [2002] and da Vinci® Surgical System (shown in Figure 2-3) [2005] have also applied supersensory in telepresence surgery.

Figure 2-2. A surgeon’s console and a remote surgical unit (RSU) located at the surgical table [Green et al. 1995]

Figure 2-3. da Vinci® Surgical System [http://www.intuitivesurgical.com/index.aspx]

Supersensory elements can also provide the user with a novel immersion feeling in a remote environment. For example, the user can control the zoom function of the camera on a telepresence robot to observe the small details of the remote environment, which the user does not normally see with the naked eye.

(4)   Anthropomorphic elements

In telepresence applications, non-anthropomorphic telepresence robots are usually designed to perform specific tasks which do not involve interacting with human. Anthropomorphic elements are of great importance for robots involving human-robot interaction. Many researches added anthropomorphic elements to their telepresence robots in order to improve the interaction between users and participants.

For interacting with the participants, the user’s face displayed on a LCD screen is incorporated in many telepresence robots. Dr. Robot and the telepresence system PEBBLES described in the first chapter of the thesis use a LCD screen to display the users face, as shown in Figure 2-4 and 2-5. It lets participants realize whom the telepresence robot represents.

Figure 2-4(a). A patient is consulting the doctor through Dr. Robot [http://www.intouch-health.com/]

Figure 2-4(b). Dr. Robot in Show-Chwan Memorial Hospital [http://www.ettoday.com]

Figure 2-5. Telepresence system PEBBLES [http://www.ryerson.ca/pebbles/]

The commercial product “Giraffe” [2007], a remote-controlled mobile video conferencing platform, is also a telepresence robot application. As shown in Figure 2-6, Giraffe is composed of two subsystems: the client application, and the Giraffe robot itself. On the Giraffe robot, there is a video screen and camera mounted on an adjustable height robotic base. The user can move the Giraffe robot from afar using the client application. Using software that runs on a standard PC and webcam, the client application connects the user to the distant Giraffe robot through the Internet.

Figure 2-6. Giraffe is a remote-controlled mobile video conferencing platform [http://www.headthere.com/products.html]

Coradeschi et al. [2006] addressed appearance and behaviors of robot are essential in human-robot interaction. A robot’s appearance influences subject’s impressions, and it’s an important factor in evaluating the interaction. Humanlike appearance can be deceiving, convincing users that robot can understand and do much more than they actually can. Observable behaviors are gaze, posture, movement patterns and linguistic interactions. Appearance and behavior are tightly coupled.

It is arguable whether the LCD display is an anthropomorphic element. An LCD display may even turn the human users’ impression towards the telepresence robot into a “movable teleconference system” such as Giraffe, instead of the humanoid-type robot. There are many other solutions for anthropomorphic elements [Burgard et al., 1999; Burgard et al., 2003; Fong et al., 2003; Schulz et al., 2000; Trahanias et al., 2005]. For example, Burgard et al. installed mechanical facial expressions and a touch screen interface on their tour-guide robots to attract on-site visitors’ reactions.

Fukuda et al. [2004] introduced their robotic head system, the “Character Robot Face (CRF)”, which is developed as a human-robot communication interface with natural modalities. CRF has facial expressions used for natural user interaction. Facial expressiveness in humanoid-type robots has received a lot of attention because it is a key component to developing personal attachment with human users. From a psychological point of view, using facial expressions is an effective method to build personal attachment in communicating with a human user.

In summary, anthropomorphic elements enhance the impression of the telepresence robot as a true representation of the remote user. The friendly interface and characteristics of the anthropomorphic telepresence robot also increase the interactive capability of the participant as a dialogist. Mechanical facial expressions can also be used to increase the humanoid characteristics of the telepresence robot to further encourage people to interact and communicate with the user.

(5)   Stereoscopic and stereophonic elements

In telepresence research, stereoscopic and stereophonic design elements are often emphasized to create a telepresence illusion of the remote environment or people aiming to increase the feeling of immersion for the user. For example, the user can identify the distance between an object and the telepresence robot by binocular vision [Brooker et al., 1999]; the head-related transfer function (HRTF) for stereophonic effect enables the user to identify the location and direction of a sound [Hawksford, 2002].

Telepresence videoconferencing is an important application using stereoscopic and stereophonic elements [Izquierdo, 1997; Ohm et al., 1998; Xu et al., 1999]. Telepresence videoconferencing enables the users and the participants to communicate more efficiently. In other words, the interactive capability of the participant as a dialogist is enhanced. Lei et al. [2004] proposed a representation and reconstruction module for an image-based telepresence system, using a viewpoint-adaptation scheme and an image-based rendering technique. This system provides life-size views and 3-D perception of participants and viewers in videoconferencing. The purpose of this research is to provide the feeling of a virtual-reality presence, in which realistic 3-D views of the user should be perceived by the participant in real time and with the correct perspective.

Rhee et al. [2007] presented a low-cost method for visual communication and telepresence in a CAVETM-like environment (The CAVE is a multi-person, room-sized, high-resolution 3D video and audio environment invented at EVL in 1991 [The Electronic Visualization Laboratory, 1991]), relying on 2D stereo-based video avatars. The system combines a selection of proven efficient algorithms and approximations in a unique way, resulting in a convincing stereoscopic real time representation of a remote user acquired in a spatially immersive display. Figure 2-7 shows the demonstrations of the system.

Figure 2-7. Visual communication and telepresence in a CAVETM-like environment [Rhee et al., 2007]

(6)   Eye contact

Eye contact is an important element for human-to-human communications. It is a well-known cue for gaining attention and attracting interest. In human-robot interaction, a robot with eye contact would be more familiar and comfortable for humans to interact with. Yamato et al. [2003] focused on the effect that recommendations made by the agent or robot had on user decisions, and designed a “color name selection task” to determine the key factors in designing interactively communicating robots. They used two robots as the robot/agent for comparison. From the experiments, eye-contact and attention-sharing are considered to be important features of communications that display and recognize the attention of participants.

In social psychology, “joint attention” is people who are communicating with each other frequently focus on the same object. The joint attention is a mental state where two people not only pay attention to the same information but also notice the other’s attention to it. Imai et al. [2003] investigates situated utterance generation in human-robot interaction. In their study, a person has joint attention with a robot to identify the object indicated by a situated utterance generation generated by the robot named Robovie. A psychological experiment was conducted to verify the effect of eye contact on achieving joint attention. The experiment divided 20 subjects into two equal groups; one was given Robovie with eye contact and the other was given Robovie without eye contact in interaction. From the experimental results; it was obvious that a relationship developed by eye contact has a more fundamental effect on communications than logical reasoning or knowledge processing.

In telepresence applications, eye contact can increase the immersion feeling of the user and the interactive capability of the participant as a dialogist. It is very difficult to achieve eye contact during interpersonal communication between the user and the participant through a telepresence robot when the face of the user is displayed on a LCD screen, because the placement of the camera on a telepresence robot is usually on top of the LCD screen, which hinders direct eye contact between the user and the participant through the telepresence robot.

Hopf [2000] proposed an implementation of an auto-stereoscopic desktop display suitable for computer and communication applications, as shown in Figure 2-8. The goal of this research is to develop a system combining a collimation optic with an auto-stereoscopic display unit to provide natural face-to-face and eye contact communication without causing eyestrain.

Figure 2-8. Auto-stereoscopic display unit [Hopf, 2000]

(7)   Autonomous behaviors

In principle, a telepresence robot is operated by a remote user, and does not possess autonomous behaviors. However, the telepresence robot should be able to deal with possible hazardous situations autonomously when the remote user is not aware of the hazardous situation, cannot control the telepresence robot properly, or the data transmission is lost. From the user’s view, autonomous behavior increases the user’s capability of projection to operate the telepresence robot safely and reliably in a dynamic environment. From the participant’s view, autonomous behavior also increases the interactive capability of the participant as a dialogist. For example, a telepresence robot with the autonomous behavior of identifying the direction of the participant who is speaking can assist the remote user to respond more quickly and properly.

An interactive museum tour-guide robot, as shown in Figure 2-9, was developed by two research projects TOURBOT and WebFAIR funded by the European Union [Burgard et al., 1999; Schulz et al., 2000; Trahanias et al., 2005]. Thousands of users over the world controlled this robot through the web to visit a museum. They developed a modular and distributed software architecture which integrates localization, mapping, collision avoidance, planning, and various modules concerned with user interaction and web-based telepresence. With these autonomous features, the user can operate the robot to move quickly and safely in a museum crowded with visitors.

Figure 2-9. An interactive museum tour-guide robot, pleasing the crowd [Burgard et al., 1999; Schulz et al., 2000; Trahanias et al., 2005]

2.2   Basic data transmission structure and design elements of TRIC

The telepresence robot TRIC developed in this research aims to be a low-cost, lightweight robot, which can be easily implemented in the home environment. Therefore the primary decision was to use ADSL and Wireless Local Area Network (WLAN), which are commonly found in the home environment, as the channel of data transmission. Two-way audio and one-way video communication can be transmitted through a network Internet Protocol (IP) camera, which is also a common tool for home monitoring.

The controlling cores of most telepresence robots are PC-based. Dr. Robot, PEBBLES and Giraffe used video conferencing technology for data transmission. It needs specific software and interface running in users’ computers. The channel between a user’s computer and the telepresence robot is a peer-to-peer communication. The advantage is that the remote user’s face can be displayed on the LCD mounted on the telepresence robot’s head. However, it is difficult for multi users to log in telepresence robot at the same time. The core of the interactive museum tour-guide robot is a PC-based web server. It allows thousands of users over the world to log in the robot through the web to visit a museum.

Instead of using a PC, a “Mobile Data Server (MDS)” was developed as the core of TRIC. Figure 2-10 shows a picture of the laboratory prototype of the MDS, which consists of a PIC server mounted on a peripheral application board. The PIC server integrates a PIC microcontroller (PIC18F6722, Microchip), EEPROM (24LC1025, Microchip) and a networking IC (RTL8019AS, Realtek). It provides networking capability and can be used as a web server. The peripheral application board (as well as the program in the PIC microcontroller) can be easily customized to adapt to different sensors and applications. The dimensions of the MDS prototype are 40mm×85mm×15mm. Internet and serial interface (RS-232) are the primary communication interfaces of the MDS with client PCs and other devices. The MDS also receives external signals (e.g., sensor signals) through specific analogue or digital I/O ports, and provides inter-integrated circuit (I2C) communications to allow connections with external modules. A Multi-Media Card (MMC) in the MDS can be used to store data in FAT16 file format. Compared to a PC, the MDS is low-cost, has smaller dimensions, consumes less energy (thus can be powered by batteries), is not affected by viruses, and is safer and more reliable.

Figure 2-10. A picture of the laboratory prototype of the MDS

Figure 2-11 shows the basic data transmission structure of TRIC. The user projects herself/himself to TRIC in the remote environment by sending control commands to TRIC through the Internet gateway. The user is able to immerse in the remote environment from the sensory feedback transmitted through the Internet gateway. TRIC uses a WLAN the connector by connecting to the WLAN in the home environment. MDS takes charge of receiving commands from the user and sending commands to specific modules which coordinate with each other to perform specific tasks. Finally the user can have physical interaction and verbal communication with the participant by controlling TRIC as his/her physical extension in the remote environment.

Figure 2-11. The data transmission structure of TRIC

Under this basic structure, Table 2-2 lists the design elements currently planned for the design of TRIC. The implementation of “teleoperation” in TRIC is quite fundamental. Teleoperation allows the user to move TRIC through the environment while controlling the pan and tilt of the IP camera from a remote client PC. It lets user be in two places at once by teleoperating TRIC. Supersensory ability is reflected in the zooming capability of the IP cam and the sensing capability of the various sensors installed for environment detection.

Table 2-2. Design elements included in TRIC

Design Elements

Corresponding Technological Strategies

Data transmission

use MDS for the core of system


design of mobility platform


provide zoom of IP cam, implement various sensors for environment detection

Anthropomorphic elements

design of humanoid appearance and interactive behaviors

Stereoscopic elements

Not included

Stereophonic elements

Not included

Eye contact

control TRIC to gaze at participant

Autonomous behaviors

share control authority to participant and environment

TRIC is not intended to be only a communication media, such as the “movable teleconference system” Giraffe. Through TRIC, one important goal is to give the participant the impression the remote user that he/she is communicating is actually in the local environment. Anthropomorphic elements enhance the impression of TRIC as a true representation of the remote user. Design of humanoid appearance and interactive behaviors for TRIC can facilitate interaction with participants.

For this reason, we also decided not to use an LCD to display the user’s face, which would result in an impression that the user is in a remote location. In most telepresence applications utilizing an LCD display, the camera is mounted on top of the LCD screen, which hinders direct eye contact between the user and the participant. Instead, the camera on TRIC is packaged into a “head” with humanoid expression, which also facilitate the design of “eye contact” because the camera is indeed the “eye” of TRIC. Sophisticated stereoscopic and stereophonic elements have been omitted to keep TRIC a low-cost, affordable homecare robot.

Autonomous behavior is the design element that received the most attention during the planning of TRIC. In principle, a telepresence robot is operated by a remote user who possesses complete control authority. However, a major emphasis of this research is to implement key autonomous behaviors in TRIC in order to increase the user’s operating capability and reduce the user’s workload during operation. By doing so, the aim was to also increase the interactive capability of elderly people as reciprocal communicators.

Adding autonomous behaviors implies that the control authorities of the telepresence robot are shared with the participant or the environment it is interacting with. Several possible features for sharing control authority with the remote participants are discussed below:

l          “Look at that!”

Participants engaged in a face-to-face conversation often share the same view by pointing to an object in discussion. However, it will be difficult for the user to either point to a certain object or to find the object the remote participant is pointing at through the telepresence robot. A 2 degree-of-freedom robot arm equipped with a laser pointer is used as a joint attention device to realize the “look at that!” function. The remote participant can direct the view of the telepresence robot by pointing the laser pointer to the object in question.

l          “Where is the speaker?”

It is not easy for the user to locate the source of sound in 3D space through the telepresence robot. When interacting with the remote participant, “Where is the speaker?” enables the telepresence robot to automatically locate and track speakers without control from the user. With this feature, the participant controls the telepresence robot by using her/his own voice.

l          “Come here!” and “Follow me!”

In “Where is the speaker?” the telepresence robot can locate the source of the sound. Therefore the “Come here!” feature allows the user to command the telepresence robot to go to the source of the sound. “Follow me!” is another interactive behavior which is common in interpersonal communication. The passive infrared motion sensors combined with ultrasonic range-finding sensors are used to perform the low cost and reliable function of “Follow me!” where TRIC continuously follows the intended participant.

Several possible modes in sharing control authority with the remote environment are discussed below:

l          Obstacles avoidance

It is difficult for the user to identify environmental information from the robot’s limited viewing angles. Therefore automatic obstacle avoidance is necessary. When an obstacle is detected within a specific distance from the robot, the obstacle avoidance algorithm is activated, and the robot deviates from the movement direction controlled by the user in order to avoid this obstacle.

l          Self-maintenance

The most fundamental self-maintenance function is the ability of TRIC to automatically recharge its battery when needed. This includes the ability to detect energy capacity, self-positioning to locate and move to the charging station, and automatic parking control to dock the robot in the charging station.

The hardware and software design of TRIC to achieve these functions will be described in details in the following chapters.