|
From the December 1990 meeting, BASS Vol. 19 No. 3
Preliminary Discussion by E. Brad Meyer
Brad Meyer began with a transparency detailing the steps through which a piece of classical music reaches
the ears of the audiophile. First of course is the composer. Next come the players (and their instruments) and the conductor, who realize the composer's
ideas in acoustic form. The sound goes out through the air into a hall, and is changed by microphones into an electrical signal. These electrical signals
go through a mixer, and thence into a master recorder. Then comes the editing, and perhaps processing, reverb, and remixing. The result is transferred
to some consumer music carrier--LP once upon a time, today analog cassette or CD. Next comes the home player, followed by connecting wires, preamp and
a power amp, speaker wires and loudspeakers, where the signal is transformed back into sound. Finally there is the listening room. Mark Fishman noted,
to laughter, that Meyer had left out the influence of the power plant and the ac lines.
Meyer drew a box around the links in the chain over which the audiophile has control: the sequence from
the music carrier through the listening room. Within this box lies the subject matter for consumer audio publications.
Tonight, however, Meyer was going to focus his attention on the very end of the chain: the listener. Many
things affect how a listener perceives the sound. Among them are hearing limits, experience, fatigue, mood (one's own and that of others), and pharmacological
substances both medical and recreational. Ambient lighting affects mood and is thus a factor, as Meyer discovered early on in his audio pursuits. Lighting
just the speakers and leaving the rest of the room dark makes the sound more vivid and dramatic. Meyer suggested that people try this, and also darkening
the whole room.
Listener hearing acuity of course is a major factor. Meyer mentioned that one now can get tested out to
20 kHz instead of just the standard 8 kHz [see the May 1991 meeting summary, in v19/1--PSH]. Meyer had had his ears' hearing thresholds (not the same
thing as frequency response) measured and found that he has measurable loss in low-level detection at 12 kHz and a lot more above that. He found it
a sobering experience, as have many of us.
Meyer pointed out that the ear's equal-loudness curves tend to bunch at the frequency extremes. This means
that once the highest and lowest sounds are above the hearing threshold, a small change in level will sound louder than a similar change in the mid-band.
This became painfully obvious during his high-frequency hearing tests. For example, at 18 kHz Meyer's threshold is 106 dB spl. At 104 dB he cannot hear
it at all and yet at 106 dB; he yanked the phones from his head. Fishman quoted Bob Berkovitz as saying that if the sound is not audible it does not
damage the ear even if its level is quite high.
Returning to the playback chain, Meyer went on to say that typically the audiophile can affect only a small
part of it--the playback system and the listening room (both acoustically and how it may be made to influence mood). There is no control over the recording
process, although Meyer suggested that those who have the opportunity to do live recording really should try it--it is dismaying how much influence
microphone choice and placement have on the recorded sound.
Meyer speculated that so much attention has been paid by audiophiles to trivial aspects of the playback
chain such as the cables and ac power because the advent of the CD has eliminated the audible distortion introduced in the process of getting the signals
from the master tape to the playback preamp, hitherto an area ripe for great fussiness. Things are a lot less interesting now for those looking for
controllable detail.
Mark Fishman brought up an interesting comment from J. Gordon Holt on memory: Holt now has better memory
than hearing. His memory now hampers his enjoyment of many musical performances because he misses the sheen of the violin and the delicacy of the cymbals
and triangle, which he remembers but no longer hears. The discrepancy bothers him. David Moran suggested that Holt might find it helpful to employ wider-dispersion
tweeters and, theoretically, some judicious equalization, to get more audible treble into the reverberant field.
Alvin Foster reported a more cheerful result, saying that his own memory helps add sheen to the strings
rather than detracting from his current listening enjoyment. Dan Banquer commented that, as a musician, he has always felt that nothing is like being
in the middle of the music. No matter how many millions of dollars of equipment one has, it cannot recreate the experience of performing. Meyer added
that he has a BSO violinist friend who complains that the BSO broadcasts do not have enough string sound. Meyer asked him how often he has listened
to the BSO from out in the audience. [This again poses the question of what "viewpoint" the sound should be created for and/or played back
from--PSH.]
The ABX Comparator
Historically, it was an interest in tracking down the source of perceived differences in the playback chain
that led to the construction (by David Clark and associates) of ABX boxes like the one Meyer was to demonstrate at this meeting. During the next portion
of the evening he introduced the ABX box and played with it a bit to show how it worked. The system assembled for the meeting comprised an Apt preamp,
Audio Dynamics power amp (Japanese, class AB, bipolar), the Allison 205 3-piece satellite/woofer system (lightly equalized with a dbx 10/20 to boost
the low bass and help ameliorate a presence wrinkle), and an AR turntable fitted with a JH Formula Four arm and Stanton cartridge.
The ABX comparator switches between two sources. The box has three buttons on the remote and three LEDs
on the front panel, labeled A, B, and X (hence the product name). There is another pair of buttons, labeled Down and Up, which change the numeric display
on the unit. When the box is powered on, it generates 100 random assignments of X to either A or B, one for each possible displayed number on a two-digit
readout (00 to 99). A Reset button on the main control unit returns the sequence to test number 01. Pushing A connects source A to the output, and likewise
for button B. Pushing X connects the box-selected source, which is either A or B. Neither the operator of the box nor the listeners have any notion
of which source is X until the answers are read out at the end of the test. This kind of test is called double-blind, as neither the tester nor the
tested knows the answers.
During the test the subjects (or the tester) switch among A, B, and X and then mark on an answer sheet
whether X is A or B. The test is repeated for a series of separate trials. At the end of a series, pushing the Answer button reveals the identities
of X for all trials. In the answer mode, X is on together with the selected source--if X were A for trial number 01, for example, the LEDs for X and
A will both be lit.
The ABX box is designed to determine how reliably the listener can detect differences. Preconceptions affect
perception and conclusions [in other words, not only is seeing believing, but believing is also seeing-Ed.], hence the need for single blindness. Double-blind
testing is required because the tester almost invariably (and unpredictably) influences the test subject(s). One of many well-known examples occurred
when a group of psychology students tested many subjects for IQ. The subjects were impartially tested for IQ beforehand, and then sorted into two groups
with similar IQ ranges. The testers were told that group A was exceptionally intelligent while group B was not. For each group, the testers were to
read the same script while administering the test. The result was that the group touted as smart to the test-givers scored statistically significantly
better than the group labeled stupid. Somehow the testers conveyed their expectations about performance while reading the same instructions to the two
groups, and the groups responded to the cues.
Listening
Demonstrations consisted of a range of comparative-listening tests to different musical sources, including
PCM-F1 tapes and LPs, with two different devices inserted into the B path and compared with a straight-wire bypass in the A path. This kind of line-level
comparison is easy to do well; at high, amp/speaker levels, there may be problems. Meyer also has a high-current relay box (an extra-cost option) for
switching amplifiers or loudspeakers. The large relays in this box make a soft clunk that is different for the two sources and is audible in a quiet
room; Meyer has identified X 10 out of 10 times without any signal! While the sound is quiet enough to be masked when any music is playing, testing
hygiene dictates that the relay box be enclosed or otherwise muffled.
Meyer handed out a sheet photocopied from the ABX manual which showed typical level-matching required for
reliable detection of differences between sources with 1/3 octave frequency-response aberrations. When the aberrations span a wider spectrum, level-matching
becomes increasingly critical, dropping to less than 1/3 of a dB especially in the ear-sensitive 2-5kHz region. Acuity (ability to hear difference)
also depends sometimes on how close to the threshold of hearing the level of the frequency is. At threshold, a small increase in level will make the
sound audible and enable the listener reliably to distinguish A and B when different.
Steve Owades noted that the use of the ABX box does not reduce bias in results due to peer pressure when
the box is used with more than one listener at a time. Visible or audible reactions from surrounding listeners may influence a subject's answer. Such
bias makes the answers dependent--what one listener chooses is influenced by what his or her peers choose. This may invalidate the result for statistical
analysis, which requires that the trials be independent.
The Tests
Meyer first demonstrated the operation of the ABX box by disconnecting the signal feed to the B inputs.
This simpleminded procedure--comparing an audible signal with no signal--has proven helpful in clarifying how the box works for those who, for example,
fail to pick up the point that the assignment of X remains constant for each trial. The 18 subjects present went through the exercise of writing their
answers for X on the sheet. The result: 17 correct answers and one abstention, from someone who deemed the test too obvious to dignify with an answer.
Next Meyer inserted a Technics SH-9010 parametric equalizer in the B loop and set the 3 kHz slider for
a 3 dB boost. The Q knob was set to 0.7 (the broadest setting, for a bandwidth of about two octaves). Playing pink noise through the system makes this
alteration easy to hear, and the group got a score of 18/18 without difficulty. With choral music, whose broad frequency range makes it a good test
for response aberrations, the score was 16/17.
The next test was much tougher: The 9010 was left in the circuit, but with all sliders set to their midpoints.
Unlike some consumer equalizers, the semi-pro Technics has controls that really do what they say (boost, cut, or stay flat), and the response is quite
flat in this condition except for a slight droop in the top octave. To make things more difficult, we heard only the choral music for this trial. The
group got 7/17 correct.
The last two trials were bypass tests of the Sony PCM-F1 digital processor. The F1's video output was looped
back to the input and the processor was set to a gain of 1.0 and connected to input B. The signal source was an LP made by Meyer and Peter Mitchell
of organist James Johnson--the same production whose digital version has been excerpted on the first and second Stereophile test CDs. The LP was made
from an analog master, so we really were comparing an analog source directly with an F1-digitized version. The results on the two trials were 9/15 and
7/15; the total was 16/30, 53% correct.
Analyzing Results
If listeners are really not able to detect any difference between A and B (whatever they believe) -or if
they were to guess--the outcome will tend toward 50 percent correct (and 50 percent incorrect) answers as the sample size increases. When the listeners
can tell the difference easily (as with the pink-noise test of the 3 kHz boost) the result will be all answers correct. When the difference is subtle
and some can detect it reliably while others cannot, the number of correct answers should lie between half and all correct.
[Author's note: The number of correct answers can fall below half if the trials are not independent,
i.e., if someone in the audience is influencing others. Meyer told a story of an AES workshop he and Mitchell gave when the box generated a run of
successive trials in which X was B. Many people selected A on one difficult trial, apparently thinking that it was about time to get an A--which is,
of course, a form of dependence, though dependence on previous trials and not on the other subjects in the room. In an AES preprint by my brother
and me (presented to the BAS several years ago) we suggested a much more complicated distributional function to analyze the data which would help
reduce the effects of dependent trials--PSH.]
Depending on the numbers of trials, there is a definite number of correct answers beyond which one can
say that the probability of a listener's getting that number by chance is less than five percent. This is what is known as a 95% confidence level. Assuming
independence, with six trials one has to get all six correct to satisfy this criterion. With 24 trials, 17 correct answers is the threshold. The percent
of correct answers needed to qualify for `reliably hearing differences' decreases as the number of independent trials increases.
Stereophile carried out a double-blind test and then examined the results of only those subjects who got
high scores. They concluded that this group had demonstrated the ability to hear differences. This, however, is statistically invalid: even for randomly
generated answers, in a large group 1 out of 20 subjects would be expected to satisfy the 95% criterion by chance alone. (This group represents the
5% that you're 95% confident that a given subject doesn't fall into.) To ascertain whether there really is a golden-eared group, they should have selected
the high scorers and used them for another series of trials.
The tests we took showed clear audibility to a confidence level well over 95% for the first three tests,
and null results for the last three. The tests were conducted patiently and fairly, under generally good conditions; for example, there was a minimum
of cross-comment.
Meyer noted that people typically get touchy, even grouchy, when two blind-compared pieces of equipment
are very similar. It must be noted here that some high-end reviewers have said long-term listening to each piece of equipment produces more-reliable
answers than short-period ABX switching. What they feel is that quick switching is less revealing than long-term listening to each piece of equipment--although
there is good evidence that, to the contrary, quick comparison increases acuity. In any case, contrary to popular misconception, there is no law against
leaving the ABX box in position A for a month, then switching to B the next month, and finally to X during a third month.
[Guest's addendum: Following my own experience, I tried to switch among A, B, and X at moments that would
be the most revealing of differences. Still, these tests were necessarily conducted with fairly rapid switching. Needless, to say, the system and
room were familiar to none of us. The conditions were obviously not the best, and finally, as always, a negative result does not conclusively prove
the nonexistence of anything. The test can and should be made more sensitive when possible by using the subject's own system and room, and by repeating
musical selections through both signal paths (a repeated-music test) rather than switching back and forth while a selection is playing (a running-music
test).
The stress of the tests did indeed tell after a while. Even the temperate Poh Ser Hsu was heard to snap
at someone two rows ahead of him to quit moving his head around! While this was a less than ideal test, then, I must point out that claims by writers
like Robert Harley that blind tests necessarily generate such stresses are without foundation. EBM]
|