Mailinglist Archives:
Infrared
Panorama
Photo-3D
Tech-3D
Sell-3D
MF3D
|
|
| Notice |
|
This mailinglist archive is frozen since May 2001, i.e. it will stay online but will not be updated.
|
|
"3D Sound", and relevance to 3D imaging
- From: P3D John W Roberts <roberts@xxxxxxxxxxxxxxxxx>
- Subject: "3D Sound", and relevance to 3D imaging
- Date: Wed, 1 May 96 02:56:22 EDT
>Date: Tue, 30 Apr 1996 09:28:38 -0500
>From: P3D Eric Goldstein <egoldste@xxxxxxxxxx>
>Subject: Re: Re Holophonics that old chesnut
>Even assuming that this "new" recording technique can provide additional
>spacial cues, there is no reason to believe that our auditory mechanism is
>capable of discriminating them in any meaningful way.
>We had a threat a month or so ago concerning sound reflections and echo
>location... Best information is that under normal circumstances, reflections
>arriving in under 35 milliseconds (or 60 feet!) are indistinguishable as
>seperate sound stimuli, and that real-time cues (ie head movement, amplitude
>differences) are an essential part of sound location close in. You're not
>going to gain those cues through headphones, which is what the holophonics
>folks recommend!
Eric kindly emailed me some excerpts from references, particularly:
"Fundamentals of Musical Acoustics," Arthur H. Benade, Oxford
University Press, 1976
and I was unable to send him a satisfactory reply, since I never did find
the Scientific American article I had originally referred to. I recently came
across a book, "Handbook of Perception, Volume IV: Hearing", edited by
Carterette and Friedman, Academic Press, 1978 (QP461.H38, ISBN 0-12-161904-4).
This is a ~700-page book filled with highly detailed descriptions and
measurements of human hearing, which gives some idea of how complex hearing
is. Of particular interest in this context is chapter 10, "Binaural Phenomena",
by Durlach and Colburn.
I don't dispute Eric's statement, but I think he and I are talking about
different phenomena. The apparent merging of sounds that arrive less than
35ms apart into a single sound is called the precedence effect, and rather
than a limitation may be an active mechanism for suppression of multiple
echoes, which is necessary for accurately locating and "tuning in" on
sources in an environment where echoes are possible, i.e. a normal room.
The phenomenon I have been referring to is "spatial resolution", in which
the listener tries to identify the position of a source (azimuth angle being
the first-order concern). It turns out that spatial resolution (which can be
as good as 1 degree in some cases) involves multiple mechanisms and is
dependent on many parameters, including time difference between the two ears,
volume difference, frequency, absolute volume, "envelope", etc. The results
of multiple experiments are given in the book, including some performed in
anechoic chambers and with the head fixed in position, as well as cases
where the sound is provided through headphones. If the volume to the two
ears is kept the same, then the azimuth position of the source (1-second
tone bursts at 50dB) can be determined by the difference in interaural
arrival time, with sensitivity a function of frequency. Looking at one of
the graphs, the "just noticeable difference" is ~55us for a ~120Hz tone.
The minimum distinguishable time difference drops to ~15us for a 1000Hz
tone, then rises rapidly, with time difference / interaural phase impossible
to distinguish above ~1400Hz. For certain other types of sounds such as
pulsed noise, interaural time sensitivity can be as small as 6-8us. Over a
wide range of frequencies, sensitivity to volume differences between the two
ears appears to be about 0.4 to 1.2dB. The indication seems to be that for
spatial resolution of sounds, time differences between the two ears appear
to predominate for frequencies below ~1400 Hz, and volume differences
predominate above that frequency range, though the exact nature of the
sound source is very important.
The source of the original discussion was the many ways humans have of
perceiving depth an the relative positions of things around them, the
ways multiple cues are combined, and the difficulty of recreating a realistic
3D experience. Certainly the other cues Eric describes would be factors in
a typical real-world situation.
One would not normally expect sound to be a necessary part of a presentation
of 3D still pictures, but for moving 3D pictures, 3D sound could add a
significant "dimension" to the experience. The fact that human hearing uses
many different cues to determine position and depth implies that a hierarchy
of techniques could be employed, from simple methods that handle one or two
cues, on up to advanced techniques that handle many cues. For an example of
the latter, imagine a VR headset recreating a virtual world in which not
only do 3D objects shift around as you move your head, but also the sound
sources track the visual points of origin. You could even get the appropriate
echoes from "virtual walls", and have sounds that you produce bounce around
in the virtual environment. I heard a lecture two weeks ago in which it was
commented that to a typical audience, good sound quality in a video production
greatly enhances the perception of image quality - making the 3D sound effects
as realistic as possible in a 3D movie or VR environment would be likely to
make them seem much more realistic than images alone could do.
A few more comments on the general subject:
- It has seemed to me in the past that among the most difficult sound
sources to locate without moving around are continuous high-frequency
tones. That would be consistent with the above description.
- Humans have some ability with certain types of sounds to get elevation
information - I'm not sure how they do that.
- I believe I read somewhere that the convolutions of the outer ear are
thought to do some preprocessing of incoming sound - perhaps changing
the phase relationships of incoming frequencies. This may account in
part for some of the abilities of human hearing, and may complicate the
recreation of 3D sound.
- Mixes of multiple frequencies and changes in the sound source (volume,
phase, frequency composition, etc.) over time can be important for some
abilities.
- I came across several articles on the use of background acoustic noise
to perform 3D imaging in underwater environments.
- For those of us with twin-camera 3D rigs, the precedence effect can be
very important. I listen to the shutter clicks of the two cameras to
judge how good the synchronization was. However, I sometimes detect a
difference by ear, when the results of the photo indicate that the
synchronization was much better than 35ms. I would guess that with the
cameras so close to my two ears, binaural time differentiation may also
somehow be playing a part.
- For an example of some of the things mammalian auditory systems are
theoretically capable of, see "Biosonar and Neural Computation in Bats"
starting on page 60 of the June 1990 issue of Scientific American.
Of course humans use a different frequency range, and aren't as adept
at generating sonar pulses as bats, but many of the principles are
likely to apply to some extent.
- I don't have any opinion on "Holophonics", in the absence of further
information. Human hearing appears to consist of dozens to possibly
hundreds of specific capabilities, but I don't know if any of them
apply to this.
- In reference to Eric's point on adapting to an echo-filled environment,
I find that if I hear a sound somewhere out of sight (i.e. several
rooms away and around multiple corners in a house), I build up a mental
model of where I think the sound is "really" coming from, but my spatial
resolution also comes up with a specific direction where the sound "seems"
to be coming from.
- Having had a brief exposure to the many capabilities of human hearing, I
shudder to think what 90% of the population is doing to its hearing.
Most people wouldn't look into a laser or stare at the sun for long
periods of time, but they don't seem concerned about doing analogous
things to their hearing.
- Echoes provide some depth information, even over distances as small
as a few feet.
- Anybody know if there's a stereo telephone conferencing system on the
market? Many of the mechanisms in human hearing to separate sounds of
interest from background noise depend on a true binaural input.
- In common usage, the term "stereo" is applied to both sound and visual
systems. Similarly, there seem to be many points in common between
the fields of 3D visual imaging and "3D sound".
John Roberts
------------------------------
|