Recently we ran an internal experiment attempting to solve some typical challenges in modern offices. We also wanted to evaluate some emerging technologies and see how applicable they might be.
We picked the following two problems to focus our attention on.
Problem #1 - Hot desk utilisation
Many modern offices use a hot desk policy. This can present problems when there are fewer seats than people, typical for organisations that support working from home or have people who are regularly out of the office.
As an employee, you want to know if there are desks available before you head to the office and waste time looking for a seat.
Problem #2 - The visitor experience
Visiting modern offices involves checking in with a concierge, being given a temporary pass, and then being expected to find your way to the person you want to visit.
It's easy to get lost, and it's not the best experience for new visitors.
We want our visitors to feel welcomed, valued, and amazed by the use of technology in streamlining their visit.
Tackling Hot Desking
To solve the hot desk occupancy problem, we spiked two different technologies; bluetooth beacons and WiFi probe requests.
While we could identify where people were using a mobile app and a beacon we had concerns about potential privacy issues. Since privacy issues are about the appropriate use of technology, not just "can the technology do it" we decided to stop pursuing this particular experiment and focus our time elsewhere.
It's a good reminder that we have a responsibility as developers to think of ways our technology might be misused and whether it can be turned into something it was never intended for.
Helping our Visitors
For our second problem, the visitor experience, we used a mixture of different services to compose a solution:
- Facial recognition
- Bots and natural language services
- Speech recognition, and
- various internal systems APIs
We ran a one week PoC (proof of concept) to see what we could achieve.
Our proof of concept was a success and we were about to turn it into a full application, but at that point in time we had people over at Amazon's ReInvent conference in the US and they told us about the new Amazon Virtual Concierge. We just had to give it a try!
The Virtual Concierge
The virtual concierge has equivalent services to those we used from Azure. Rekognition, Lex, Poly and other supporting services. What they had on top of Azure was the "Virtual Host", a 3D avatar that creates a more engaging, human-friendly interface for our visitors.
We began with the virtual concierge starter pack and built on top of the services it provides:
- Host component
- Speech component
- Point of interest (so the host can maintain eye contact wherever the visitor moves)
- Amazon DynamoDB: to keep a database of people, which is queried when a face is detected
- Amazon S3: to store any required asset
- Amazon Lex: to filter a user's speech into different requests
- Amazon Poly: speech to text service
Structure of the application and user flows
Where a standard application has a series of screens, the virtual concierge operates as a series of "scenes" connected by user flows and actions.
We restricted the PoC to the following scenes:
- Welcome: Our introduction scene, spoken by the virtual host. This is the launch point of all other scenes.
- Visitors: Where some options are shown to visitors and they can choose from them.
- Staff Verification: For showing our visitor the people with the name they're looking for so they can choose the right person. Helpful when there are two people with the same name, or when our visitor doesn't know a person's full name.
- Notification: Where we ask for our visitor's name and notify the person they are here to visit that they've arrived.
- Staff Welcome: Where we detect staff via facial recognition and welcome them back (and add fun messages like happy birthday, happy anniversary, etc.).
- Information: Where anybody can get information about what options are available and how to navigate within the virtual concierge.
- Idle: When the visitor or staff scenes are closed, we provide suggestions on how to further interact with the virtual concierge.
We have implemented two user flows so far, Visitor and Staff (with plans to expand on these later).
When a staff member starts a conversation with the Virtual Host we use face detection and the AWS Rekognition API to discover who that person is and welcome them back.
For visitors, a more detailed scenario is implemented. They start interacting with the Virtual Host by saying "Hi, Fiona" (the name of the virtual host can be customised), or just pressing the start button.
From there, further instructions are provided which will be read to them. Visitors can then speak their intentions. For example:
I am here to see Yaser
Once we detect their intent, we can then transfer them to the verification scene where we show them people with the name they mentioned.
The default name detection is terrible so we are providing Lex with specific names and their synonyms to search against. We also show our visitor a picture of each person to help them better identify who they are visiting.
Once our visitor confirms who they're meeting, we redirect to the notification scene, where they enter their name and we notify the person they're seeing that they have arrived.
Behind the scenes
The web application has a main script containing a web worker. The web worker has all the background activities required for application to operate. Services like face detection, face recognition, voice recording, and communication between main thread and web worker are all handled here.
As an example, when you press the mic button and speak, the audio is captured in an event handler which encodes it and sends it to Poly for conversion to text.
Similarly, when someone stands in front of the camera, a method in the worker gets called, tries to detect a face and if the face is detected, sends a picture to Amazon Rekognition service to see if that person is known.
Multiple services are used behind the scenes. At the core of this experience is the ability to converse with the Sumerian Host which is enabled by the Sumerian Dialogue (chatbot) and Speech (text-to-speech) components, as well as the animation system that syncs the gestures and lip sync with the speech. You can use the Host component's Point of Interest system in conjunction with OpenCV's face detection to follow the location of the user's face, relative to the webcam screen space.
The team was pleased with the overall experience using the Amazon Virtual Concierge, though we found few issues that are worth mentioning.
- Amazon Sumerian doesn't support collaboration. Only one person can edit a particular scene at a time.
- You cannot keep the source code in source control. At most you can export the whole scene as a zip file and share it.
- Even simple scenes take time to compose.
We had a great opportunity to experiment with the virtual concierge and create an unforgettable user experience.
As a team, we also loved being able to use some new AWS services and integrate them to solve a genuine problem rather than just learning the theory of how something works.
This proof of concept wouldn't have been possible without the help of these amazing people:
- Alex Mackey
- Donna Spencer
- Geoff Polzin
- Satvik Kumar
- Curtis Lusmore
- Prashanth Nagaraj