[ NATO-PCO Home Page ] [ Table of Contents of NEWSLETTER # 60 ]

........ published in NEWSLETTER # 60

SPEECHREADING BY HUMANS AND MACHINES: MODELS, SYSTEMS AND APPLICATIONS
by Dr. G. Stork and Dr. E. Hennecke, Ricoh California Research Center, Menlo Park/CA (U.S.A.)

This volume (NATO ASI SERIES F150) is a product of the NATO Advanced Studies Institute held at the Chateau de Bonas (France), in 1995 - the first interdisciplinary meeting devoted to the subject of speechreading (audio-visual speech recognition of "lipreading"). The workshop brought together representatives from the leading international laboratories studying how image information is used by humans and can be used by machines for improved speech recognition and understanding.

Part I discusses speechreading by humans, including learning and the role of linguistic and temporal constraints. Neural and psychological models of sensory integration are compared and contrasted, with particular attention to intriguing bi-modal phenomena such as the McGurk illusion. Preliminary brain scan studies show where acoustic and visual information may be integrated for perception; careful experiments using computer graphics techniques show which components of the face are best speechread.

Part II reports on speechreading by machines, including progress in hardware design, image processing (e.g., face finding, lip-, chin- and tongue-tracking methods), sensory integration, learning and recognition algorithms. The information provided by the visual signal is complementary to that of the acoustic signal, and hence is most useful for utterances that are poorly recognized by acoustic means. Several groups report recognition accuracy higher than that of state-of-the-art acoustic-only recognizers, particularly in noisy environments. As such, automatic speechreading promises to broaden the range of applicability of automatic speech recognition to numerous real-world situations such as noisy offices, bank ATM machines, automobiles and airplane cockpits.

Part III contains supporting material, such as the most complete bibliography ever assembled and an extensive list of patents. Summaries of the Workshop's lively interdisciplinary panel discussions - on topics such as databases, applications, and human-interface issues - complete the volume.

Readers in psychology, neurophysiology, machine speech recognition, computer vision, and human interface design will find a great deal in this book. The mix of tutorial/overivew chapters with current research papers, and fundamental science with applied technology make this an indispensible resource; the book is a landmark in the rapidly expanding field of speechreading.
Reference books: D69, F113, F136, F150

[ NATO-PCO Home Page ]