The VoIP Integration Experts

Yealink XML Browser

VoIP KnowHow / July 18, 2016

Modern professional IP phones like Yealink’s T46G have so many features that you sometimes wonder if you can still make a simple call. Well, I can assure you: you can still call your lovely grandma, but if needed, you can also forward any calls of your mother-in-law to the mailbox or an auto attendant :-)

Besides answering and forwarding calls, a receptionist might also face more complex workflows. For example, during office hours the receptionist might join a huntgroup but during night all calls must be forwarded to some central call center in a different location. The receptionist will greatly appreciate if he can control such workflows directly from his phone, especially if he needs to invoke those workflows several times a day.

For such complex workflows, Yealink’s T4x phones support custom on-display menus. In Yealink jargon, this is called Yealink XML Browser. Simply put, you can show custom menus on the phone’s display and ask the receptionist for input. For example, you could ask the receptionist to select which huntgroup to join or enter a phone number to forward calls to. Another common use-case is searching in a company-wide address book. It is important to note that the XML Browser only works on the phone’s main display, but not on any possible extensions attached to it.

Yealink T4x phones support 2 basic ways to trigger the XML Browser and show a custom menu to the receptionist:

  1. The receptionist presses a button on the phone to bring up the XML menu.
  2. A server pushes an XML menu to the phone, which displays it to the user.

So XML menus can be either triggered by the user or by some central service. For the second case, menus can either be pushed directly to the phone or they are part of a SIP notify message.

From a technical point of view, the phone performs a HTTP GET request when the user invokes a menu via a button. Also, any subsequent menu is retrieved by the XML Browser via a HTTP GET request. In case of the central push, a HTTP POST message is sent to the phone.

One might wonder if Yealink XML Browser is similar to a normal web browser like Google Chrome or Mozilla Firefox. Both types of browser can render some markup, ask the user for input and request new pages from an URL. However, modern web browsers can do much more than simply displaying web pages. They are complete application platforms with runtime environments. Besides pure HTML rendering, you also got JavaScript, stylesheets, local storage, and many web APIs like location lookup and WebRTC. Nothing like that is supported by Yealink XML Browser.

So yes, Yealink XML Browser is like a web browser, but like a web browser from 1995. It can only render menus or execute some very basic commands like changing the phone’s configuration. However, it can’t alter the menu itself or do any kind of computations on the phone. All logic must be done on the server providing the XML markup to the phone.

Programming model

Yealink XML Browser has many limitations, making it simple to set up and get started, but also only capable for handling semi-complex use-cases. Let’s take a closer look at the programming model. In a nutshell, applications for the Yealink XML Browser are a set of linked XML documents.

Yealink XML browser flow

Each menu workflow might consist of several pages. Each page is described by an XML document. You can transition from one document to the next via URLs, i.e. hyperlinked documents. Each XML document can only contain a single XML object. Yealink has define 9 different XML objects. Six of those XML objects define some screen element like a selection menu whereas the remaining three XML objects represent some pre-defined tasks, e.g. to change the configuration of the phone.

As each XML document can only contain a single XML object, you can’t mix them to build more complex menus. Instead, you need to put different objects on subsequent menu pages and link them via URLs.

The input provided by the user gets included in the request made to “leave” the current XML document. Besides the invoked URL, you can use this input in your business logic to generate the next document to return. For example, in a first menu page the user selects a kind of call forwarding to activate using a TextMenu object. Depending on the selection, the next menu page might ask for a phone number using an InputScreen object or to select among several possible targets again using a TextMenu object. The phone itself only renders the XML documents retrieved from a URL, but it can’t do any kind of processing.

One might wonder how to secure XML menus. The phone itself is most probably located in an insecure network and performs any requests via the public Internet. I can’t give you a full answer to this question as in our projects we are relying on some undocumented features for security.

The first measure in securing your XML menus is to only use HTTPS, i.e. encrypted HTTP requests. Yealink phones are able to validate SSL certificates. This already ensures that the message content itself can’t be inspected.

However, how do you know which phone and user is requesting an XML menu? Here, we rely on some undocumented feature sending some HTTP BASIC authentication headers along with each request. I guess Yealink is happy to share the details with you or your customer. Maybe this feature will also become part of the official firmwares soon.

In the phone itself, you can and should limit which servers can send an XML document to your phone. Otherwise, any server able to guess the phone’s IP could send an XML document to it. Make sure to do the proper configuration or disable this push mechanism completely if not needed.

Finally, you have to harden your business logic. Do not rely on the validity of any input received, but check it. Also, assume you might receive malformed input from an attacker, who tries to inject code for remote execution.


Yealink XML Browser is a powerful tool to give receptionists and users direct access to complex workflows on their phone. The XML Browser itself only renders XML documents retrieved from some server. The server must implement logic to generate the correct XML documents based on the phone’s identity and user’s input. Special attention must be paid to securing the whole setup to not open up the phone to attackers sending their own XML documents to the phone. In a future post, we will run you through an example of programming your first own menu.

Get in touch!

Want to learn more? Got questions? Something to add?

Contact us!