Mixed Reality - Designing for HoloLens 2

In Part 1 I discussed our first impressions of HoloLens 2 at Microsoft's MR Partner Hack and some of the new features you can look forward to. In this post, we delve deeper into our experience trying to design for HoloLens 2 and important things to consider when transitioning from the old hardware to the new one.

Designing for HoloLens 2

A significant aspect of the workshop was learning about the new interaction models that are unlocked by HL2's new features. These directly impact the design of new experiences. While a straight port of existing applications to HL2 should not be overly complex, to truly take advantage of these interactions we really need to rethink our designs holistically.

Far and Near Interactions

With full hand gesture detection, HoloLens now fully supports two modes of interaction. Far interaction, which will be more familiar to existing users, involves highlighting a UI element using the gaze input or the newer hand/finger rays and manipulating it by tapping or moving your hand. The near interaction, unique to HL2, involves virtually pressing a button in real space, grabbing and moving an object, dragging a slider, or similar interaction methods.

Most of the controls that have been provided in MRTK v2 support both modes of interaction. Significant effort has gone into making near interactions obvious to try and bridge the gap of not having haptic feedback. For example, the button surface is animated when it's being pushed in and the position of your finger is highlighted on the surface as it gets closer to the button.

The key takeaway for me was that as exciting as near interactions are, it's unlikely they will serve all situations. Far interactions are still going to play an important role. In our application, while it makes sense for some of the UI elements to be placed nearer to the user and be interacted with directly, other buttons need to be closer to the crop field. An example of this is the buttons that change the crop density in each part of the field. Since the user typically views the whole field from a distance, far interaction with these buttons is a better choice here.

On that note, one of several interaction paradigms that Microsoft is experimenting with is the concept of "pulling" UI interfaces to yourself so that you can perform near interaction with them. For example, we were shown a kitchen design where a slider on the bench lets you change its colour. When standing afar, clicking on the slider pulls it right in front of you so you can manipulate it directly, rather than trying to tinker with a small slider from afar or having to walk up to it. It's then dismissed and placed back on the bench.

Allowing for multiple interaction types is generally a good practice beyond just near and far interactions, for instance adding voice commands. MRTK v2 makes this quite simple to do.

Leverage the Head

With all the new features in HL2 it's easy to fall into a trap of thinking new is better. The hand ray casts used for far interactions like clicking buttons can infact feel slower and less accurate than the HL1 gaze input and air-tap method, if you have gotten used to that. We found that in our application, because we often lay down the crop field hologram on the ground and point down at it, our hand could get clipped by the bottom of the view rectangle and therefore not track.

The takeaway there is nothing new: consider your full set of options, build, test, and measure. These types of risks can be avoided with proper experimentation and prototyping during the ideation phase of design.

Hand Menus

The addition of detailed hand tracking intuitively lends itself to what we have all been waiting for: menus that appear around your hand! Infact, Microsoft presented a dedicated session about hand-anchored menus. We also spent a good amount of time experimenting with this concept.

In our application at specific points during the simulation, the user is asked to estimate how much yield they expect from their crop and to sign a supplier contract. In HL1 this was implemented using a panel with 2 buttons to increase/decrease the estimate and a third button to sign the contract as well as some informational text. We decided to convert this to a "paper form" that appears in your hand where you can change the estimate using a slider. These are the before and after images:

Here are a few learnings from this experiment:

Here are a few learnings from this experiment:
It's cool! It's very cool!! That doesn't necessarily mean it's the right interaction for your use case, but if you are trying to impress someone this is exactly the kind of thing you want to show them.
As I had mentioned earlier, one of the shortcomings of the hand tracking technology is that it's based on image depth so one hand covering the other will break the tracking. It's important to consider where the interactable elements should be placed. Microsoft showed us various experiments with placing buttons above the forearm, beyond the fingertips, and next to the palm. The last one seemed to have the best results for them and in our case certainly fit the "paper form" paradigm well.
On the subject of paper forms, we had a really interesting conversation about user expectation and their mental model: I initially thought of the form as being like physical paper and should, therefore, lay flat on the palm. When one of my colleagues tried it, he felt it was odd and that the UI should sit upright on the hand (a billboarding effect). The takeaway for me is that users' expectations should be met clearly. If the intention is for the UI or the object to be a physical one (like a notebook) it should be accurately modelled like one. The visual cues here were not clear.
MRTK v2 really makes this stuff quite simple. "Solvers" are scripts that can track parts of the hand like individual fingers or the palm. Infact, on the second day of the workshop a new version was released which included a Solver for detecting a "Palm Up", making it easy for us to display the menu only when needed.
We did run into some issues with the accuracy of the near interactions, especially the slider. Depending on the size of the slider sometimes releasing the notch would not register immediately so it moved slightly when you let go. We also found in certain situations the "far interaction" for the slider would trigger and it moved even though it wasn't grabbed. At the time of our experimentation, there was no way to easily disable the far interaction on the slider. Having said all that, it feels like all these issues can and will be fixed at software level, either in MRTK v2 or your own code.
Not everyone is right-handed. Or has two hands for that matter.
Lastly, and perhaps most importantly, there is the issue of fatigue. Having your hand up for any extent of time can quickly become tiring and that needs to be accounted for. I think this type of interaction is useful for top-level tasks (such as navigation) that are important but are performed infrequently and in short bursts. There are also smaller optimisations to consider: in our case we found that having the form aligned vertically or horizontally to your palm was uncomfortable for the shoulder and wrist. We settled for having it aligned at a 45-degree angle so that your arm stays in a natural position and your wrist stays straight.

The "Belt Menu"

Croptimum has some high-level actions (buttons) that currently sit statically at the top of the crop field hologram. These include "Plant" and "Harvest" actions. Since they didn't strictly relate to any one part of the crop field, we decided to experiment with having them follow the user in some way. Speaking to two Microsoft designers we got two slightly different suggestions. One was to "leash" the buttons to the user's view fully so that if they look up at the crop field, the buttons float just below their vision. You know they are there, but they don't cover your view (this is how setting windows work in HoloLens, for example). The second option, which is what we opted for, is to only leash the buttons to the head's vertical rotation. That is, the buttons always appear around your waist but follow the direction you are looking in. We ultimately found it a lot less distracting to have this menu in a more fixed position vertically. In general, it works quite nicely but we didn't get time to solve the problem of teaching the user for the first time about the existence of such UI.

I should mention that both designers suggested building a UI that the user can explicitly move and pin to a location. I have mixed thoughts about this approach. I have frequently had challenges with a HoloLens application window being "lost" and literally having to try and find where it is so I can shut it down (limited FoV doesn't help). With HoloLens 2 I occasionally found myself awkwardly trying to push a close button on the application window that had accidentally been placed just behind a monitor or table. In short, I think the freedom of manually placing UI around the place is nice but you really need to think about its accessibility especially in a busy environment. Again, "multiple interaction models" can be helpful here.

Impact of Increased FoV on Design

Technology improvements can sometimes impact your design in subtle and counterintuitive ways. On the last day of the workshop we realised something important: When using Croptimum on HoloLens 1, because the FoV is quite small, we tend to treat the experience as essentially 2 disjoint parts. One is the crop field itself with the associated UI and buttons, and the second is the soil and plant close up holograms which sit to the side, as well as the main simulation buttons and the date which hovers above the crop field. We sit/stand quite close to the field and only look at those secondary holograms as needed. With HoloLens 2, because the FoV is larger, almost all of the experience fits in the view. Almost!!. What that meant is that we would generally stand further back from the whole experience to see everything in one view. In turn, the buttons and texts were now a bit too small for comfort. We would need to tune all of those things.

Development Experience

MRTK v2 is a significant overhaul but depending on how much of MRTK v1 you were opting into the basic migration process is not too painful. A couple of the more Windows dependent features (such as Text-to-Speech) have been removed but otherwise, most of the interfaces and features have a 1-to-1 equivalent. We also found that while documentation is still very much work in progress, the sample scenes fill the gaps quite nicely. Performance has gotten more attention also, given the importance of steady 60 fps in creating a convincing experience. Dev builds now by default include a frame rate visualiser that lets you quickly see what parts of your experience require attention.

We did note a few quirks and potential bugs which were not easy to debug, such as the earlier issues with sliders, or odd behaviours with "global buttons". What was impressive though, is Microsoft's dedication to quickly and transparently tracking and fixing issues on Github. They are very active and new builds are being released very frequently.

Perhaps that biggest issue around development has been, and still is, the feedback loop. Because the full cycle of building the CPP project and deploying it to the device can take minutes you really want to be developing as much as possible in Unity itself but that's not always possible. The new possibilities with hand and eye tracking make this even more difficult. To be fair, Microsoft has put a lot of effort into having "test hands" in Unity and shortcut keys to move/rotate them and even mimic specific gestures but ultimately I found it difficult to work with for anything other than the most basic cases.

Lots More

While in this post I focused on things we tried hands-on, there are several more features and additions to the eco-system that we didn't get around to, or are not yet available. They're worth a quick mention:

Eye Tracking is a key feature of HoloLens 2. There are lots of design guidelines around this but unfortunately the examples that were shared with us were not much more than "automatically scrolling text" or "highlight object", which we have seen before on some phones for example. I'm curious about where else this can be applied in a 3D environment. Our team discussed, for example, having the "hand menu" appear only when you look at your hand. Care is needed though: eyes are a lot more erratic than we feel they are, thanks to our "brain smoothing".

Shared Experiences was talked about as well. There was mention of experiences that can be shared through a host of devices from HoloLens down a to a phone which I think is a powerful concept. Microsoft's Azure Spatial Anchors platform can help enable these scenarios.

Finally, Remote Rendering is something that I'm cautiously excited about. This isn't available for testing just yet but Microsoft claims they are able to render up to 80M triangles at 60FPS (latency will depend on where the rendering servers are). We have worked with more than 1 or 2 customers who have existing high poly models that they want to incorporate into a HoloLens experience. Usually we need to perform the non-trivial task of simplifying or remodelling them and while that will ultimately always yield the best results, I think having the option to render existing high poly models will give a massive boost in terms of quickly creating a proof of concept before sinking too much time into a project. What's interesting is, from what we are told, the streaming is done via a Unity plugin on a per-model basis. So while a high poly model will seamlessly be sent for remote rendering, the UI or simple models will still be rendered locally so technically the interactivity should be less impacted.

Overall the workshop was a fantastic opportunity to get an idea of what's coming and start thinking about how we can migrate existing experiences and build new ones. I can't wait to see what else we can come up with.