U.S. Patent No. 10,714,111: Enhanced adaptive audio rendering techniques

Issued July 14, 2020, to Microsoft Technology Licensing LLC
Filed: January 16, 2019 (claiming priority to March 30, 2016)


U.S. Patent No. 10,714,111 (the ‘111 patent) relates to enabling a system to select and use audio spatialization technology with an application programming interface (API). The ‘111 patent details a computing device which receives contextual data concerning how many audio objects the encoder communicating with the computing device is capable of handling, selects a spatialization technology for that number of audio objects, which causes the encoder to generate and communicate an output signal to one or more speakers. The computing device, with the ‘111 patent, potentially can handle spatial audio from several programs at once. The ‘111 patent could simplify spatial audio for video games, among other things, especially in instances where multiple applications are running spatial audio through multiple channels.



The techniques disclosed herein provide application programming interfaces (APIs) for enabling a system to select a spatialization technology. The APIs also enable a system to balance resources by allocating audio objects to a number of applications executing on a computer system. The system coordinates the audio objects between applications and each application can control the number of objects they individually generate. In some configurations, the system can also fold audio objects across different applications. Different spatialization technologies can be selected based on an analysis of contextual data and policy data. For instance, when a new headphone system is plugged in, the system may switch from Dolby Atmos to the Microsoft HoloLens HRTF spatialization technology. The system can dynamically control a number of generated audio objects and dynamically change a utilized spatialization technology based on changes to a computing environment.


Illustrative Claim:

  1. A computing device, comprising: a processor; a computer-readable storage medium in communication with the processor, the computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by the processor, cause the processor to: receive contextual data indicating a number of audio objects associated with capabilities of an encoder in communication with the computing device; select a spatialization technology from a plurality of spatialization technologies, wherein individual spatialization technologies of the plurality of spatialization technologies are each associated with a threshold number of audio objects, wherein the selected spatialization technology is associated with the threshold number of objects that correlates with the number of audio objects associated with capabilities of the encoder; cause the encoder to generate a rendered output signal based on an input signal comprising object-based audio and channel-based audio processed by the selected spatialization technology; and cause a communication of the rendered output signal from the encoder to one or more speakers of an endpoint device.