Automated tests using Genesys Cloud's WebRTC softphone

Oct 22, 2024

Manually testing a UI that relies on Genesys Cloud's Softphone and Notifications API can be a time-consuming and fragile process. It requires playing both customer and agent roles to simulate specific conversations - at least that's what I found. To save time, and increase confidence in what I was building, I automated these tests. At the time I posted about this on LinkedIn and got so much interest that I thought I’d write a blog post about how the tests work.

My end-to-end tests have three stages:

Automate a call to a phone number setup in a Genesys Cloud org and simulate a customer speaking
Answer the call in the softphone and simulate an agent speaking
Assert the feature I’ve developed behaves as expected

If the assertions pass then I know my feature is correctly integrated and using Genesys Cloud’s Notification API, Embedded Framework and Softphone. This reduced feedback loop allows me to develop the UI feature much faster, and with greater confidence than if I relied solely on manual testing.

Tools I use

Below are the tools I use for my tests, and what they were used for:

Puppeteer - Opening the website containing my feature and giving me control of the page to do step 2 and 3
MediaStream API Mocker - Overriding the MediaStream API so I could receive the audio the agent would hear and stream audio simulating the agent speaking
Softphone UI POM - Simplifying the control of Genesys Cloud’s Softphone in the website, making it easy to answer a call the way an agent would and ensuring the same events are fired from the Embedded Framework as you’d expect in the real scenario
IVR Tester - Phones the agent and pretends to be the customer

How the tests work

Opening the website under test

The website containing my feature, and the softphone are controlled via a Puppeteer test script. The first step of this test script is to launch the browser and point it to my website:

const browser = await puppeteer.launch({
  headless: false,
  userDataDir: "./user_data",
  defaultViewport: null,
  args: [
    "--use-fake-device-for-media-stream",
    "--disable-features=site-per-process",
    "--allow-file-access",
    "--no-sandbox",
    "--autoplay-policy=no-user-gesture-required"
  ],
});

const page = await browser.newPage();
await page.goto("http://localhost:8080");

The following two parameters from above ensure I can manually perform the browser-based auth required by the softphone, and persist the token for future tests:

headless: false,
userDataDir: "./user_data",

Although I prefer to login manually for the first test, and have the token reused for future tests until it expires, you could just automate the login process too.

Impersonating an agent over the softphone

The confidence that my feature will work comes from the fact the tests are as realistic as possible, and one facet of this is simulating the agent speaking over the sofphone.

To achieve this I override the Browser’s MediaStream API which is the mechanism through which the softphone receives audio from the agent’s headset. This allows me to:

Capture the audio coming from the softphone
Stream my own audio that goes to the softphone

Overriding the MediaStream API

Firstly, there is an option available when opening the browser to specify an audio file that it will play over a Media Device (below), however this is simply played on repeat. I want to control what is said and when.

--use-file-for-fake-audio-capture=<PATH-TO-FILE>

The solution I use is Puppeteer’s .evaluate(...) function. This function allows me to run code inside any browser window, which means I can run code inside the softphone’s iframe that replaces the MediaStreams API with a mock. I can then control the audio coming in and out of this mock:

// 1. Read in the JS code for mocking the MediaStreams API
const mockMediaDevicesScript = readFileSync(
"https://gist.githubusercontent.com/SketchingDev/00128173c26dae841a5057803fa4503a/raw/fea356b4117c6981693e8ac9eea2f3c52261ac45/mocked_media_stream.bundle.js",
"utf8");

// 2. Get IFrame for WebRTC phone
const webRtcFrame = await browser.waitForTarget((page) =>
   page.url().includes("/crm/webrtc.html"),
);

// 3. Run the script within the WebRTC frame to replace the MediaStream API
await webRtcFrame.evaluate(mockMediaDevicesScript);

// 4. Run a JavaScript function within the IFrame when you want to play an audio file against the MediaStream
webRtcFrame.evaluate((fileNameIn) => {
    window.mockMediaDevice.playAudio(fileNameIn);
}, `http://localhost:8080/${filename}`);

See my full guide on mocking the MediaStream API.

Phoning as the customer

IVR Tester streams audio over call to Agent’s Media Device via WebRTC Phone

IVR Tester is an open-source project I built that uses Twilio to automate a customer calling a phone number. The code (based on an ongoing project rewrite) to call the agent’s queue looks like:

const result = await ivrTester.run(
  { from: "0123 456 789", to: process.env.GENESYS_PHONE_NUMBER },
  openAiWhisperChat("You are a customer wanting to ....")
)

I’ve not gone into detail on this part as I would like to dedicate a blog article to demonstrating how to make AI-driven calls for testing. Subscribe, or follow me on LinkedIn to know when I post the article.

Answering as the agent

Answering a call via the softphone is done by simulating clicks within the softphone’s UI. Here is the code that uses a Page Object Model I created to simplify answering the call by waiting for a call and then clicking the Answer button:

// 1. Find the softphone page
const frame = await page.waitForFrame((frame) =>
  frame.url().includes("/crm/embeddableFramework.html"),
);

// 2. Pass that into the softphone helper
const softphone = await webRtcSoftphone(frame);

// 3. Waits at most 30 seconds for a call to answer
await softphone.waitForCallThenAnswer(30000);

I have shared the code for the Page Object Mode in GitHub.

Listening to the simulated customer

It would have been sufficient for my purposes to have the simulated customer and agent speaking across each other with phrases my feature was designed to pick up on. However, where would the fun be in that, so I instead leveraged Genesys Cloud’s transcription topic to get the agent to respond with a sensible reply, allowing the agent and customer to converse thanks to OpenAI’s APIs.

Asserting the UI behaviour

As the test conversation progressed I checked my feature was behaving as expected. Here’s an example of some of the code checking that the customer’s name was correctly extracted:

const customerFullName = await page.$eval(
  "input#staticFullName",
  (input) => input.value,
);

assert.strictEqual(customerFullName, "Mr Test");

It is possible to integrate Puppeteer into Jest, allowing you to take advantage of existing Jest assertions.

Conclusion

Hopefully breaking down each part of my automated tests has shown how easy they are to setup. With these tests I can quickly iterate on my feature with confidence that it is behaving as expected.

It’s worth pointing out that end to end tests should be used sparingly, as they take longer to run and usually become flaky over time. The testing pyramid offers a sensible breakdown of the ratio of each type of test.

Undoubtedly there are many improvements to be made, and if you find any I’d love to hear about them!

Making Chatbots