VoxWorld - About

About VoxWorld

VoxSim icon

VoxWorld is a multimodal simulation platform used to build interactive intelligent systems capable of situational understanding. Unlike unimodal natural language processing or computer vision frameworks, VoxWorld uses a multimodal semantics to assemble contextual information from multiple modalities in real time.

This allows VoxWorld to support intelligent agents that can not just hear you, but see features of the environment, such as objects of interest or human gestures, and understand the semantics of what they perceive, using a “voxicon” of real-world knowledge. VoxWorld supports agents that operate wholly in the physical world, wholly in a virtual world, or in mixed reality environments shared with humans.

What is a voxicon?

A voxicon is a lexicon of "visual object concepts," or voxemes. A voxeme is a visual instantiation of a lexical item, used for generating multimodal simulations of motion events in a three-dimensional environment.

VoxWorld is based on the modeling language VoxML, which is used to map natural language expressions into real time visualizations using real-world semantic knowledge of objects and events. Our approach adds a focus on motion and dynamics to the world of multimodal visual grounding to create an environment for situated natural language understanding.

VoxWorld sample

Who Are We?

VoxWorld was created by the Lab for Linguistics and Computation at Brandeis University, and originated as DARPA-funded research. In conjunction with other academic labs and private institutions, we have used VoxWorld to conduct research on event semantics, spatial reasoning, and referring expressions and have applied the technology to the medical and robotics domains, among others.

Our research has been featured in top AI and NLP venues like AAAI, COLING, LREC, *ACL, NeurIPS, and IEEE venues. VoxWorld is now used by research groups across multiple sites, and one of our goals is to make it available as a platform for users to easily develop custom agents and behaviors for research and scientific applications.