Psychoacoustics Applied to Music Listening and Music Appreciation

The gap between the science of acoustics and the art of music narrows as the two disciplines mutually exchange resources and ideas. An application of psychoacoustics is made to the art of music listening and music appreciation. The functional process is treated chronologically as follows: sound production, sound transmission,sound reception, sound perception. Subjective listening is the emotional and physiological response in the hearing process to the acoustic elements of musical sounds. Objective listening is the intellectual evaluation of the acoustic elements of musical sounds. Music appreciation is a result of the synthesis of subjective and objective listening.Psychoacoustics Applied to Music Listening and Music Appreciation

Dr. ibrahim elnoshokaty

najaf Auditorium project IRAQ



400 Bed General Hospitals Project
Enoshmink Technology & Media Services



Project Acoustical Design :: Auditorium/Theater

Goal: To properly balance absorption and reflection to provide a favorable acoustical environment. One must address both the need to hear and understand speech, and the desire to have a pleasant space for music.

  • Tips/Considerations
    • Recommended reverberation time is 1.0-1.5 seconds (might be higher for some auditoriums).
    • Although the seating area will provide absorption, thereby reducing the reverberation time, you will most likely need to add absorptive materials to the other surfaces within the space.
    • It is vital to control the reflections from the back wall. If you don’t control them, the presentation could reflect off the back wall and “slap back” to the presenter(s). This won’t necessarily impact the audience, but could be disastrous and distracting for the people on stage. Because of this, it’s usually necessary to treat the back wall with an absorptive material. A concave back wall could compound this problem. If you can’t avoid a concave back wall, it’s imperative that it be treated with absorptive material.
    • Splay or use irregular surfaces on the walls to avoid flutter echoes. Parallel reflective surfaces can allow sound to “ricochet” back and forth between the surfaces. This potentially annoying condition is referred to as standing wave or flutter echo. It is avoided by constructing non-parallel surfaces or by adding absorptive materials to the surface(s).
    • Consider faceting the ceiling to help with sound dispersion.
    • Control the reverberation time on the stage. Ideally, the reverberation time in the stage area should be the same as in the house. Since the stage area might have a higher ceiling than the rest of the auditorium, more absorptive materials might be required in this area. Frequently, the back wall of the stage, and possibly one or two of the side walls, is treated with an acoustically absorptive material, typically black in color.
    • Remember the space will be less absorptive when only half full, since the audience itself is absorptive. By using absorptive seating areas, the reverberation time will remain more consistent regardless of the audience size.
    • Noise from the lobby area can be disruptive. Be sure openings such as doorways are properly sealed. Consider a vestibule door system.
    • Persons seated deep under a balcony might experience auditory distortion. To avoid this, the balcony should be no deeper than twice its height. Ideally, the balcony should not be any deeper than its height.
    • Even if everything else is controlled perfectly, the space might not be usable if the background noise (e.g. HVAC system) is too loud. To help protect your design, the NC level should not exceed 20 to 35. When specifying NC, specify an actual rating, such as NC 20, rather than a range, such as NC 20-30. Although specifying a lower number will ensure minimal background noise, it might be cost prohibitive to achieve. Be realistic about the amount of acceptable noise and the project’s budget when specifying an NC level.
    • Beware of potential outdoor noise impacting your space. For example, is your location near a flight path, a railroad or freeway? If so, you might have to pay critical attention to blocking this noise. To do so effectively, you must address not only the STC or isolation quality of the exterior wall, but also for the possibly weaker building elements, such as the

Auditorium sound transmission loss:


Wall 1 and 3


Wall 2 and 4



Wall sound Isolation Thickness ( wall damping concept ):

Recommended ANSI Levels for Large auditoriums, (for very good speech articulation) :  35   db

Used material :    Rock wool

Material density:   80    m3/Kg

SPL:   85   db











Wall 1                                                                       


0.0120975198718746  m


Wall 2

0.0217294161921501  m


Wall 3

0.0120975198718746  m


Wall 4

0.0217294161921501  m



0.00532148967971023  m













2- Reverberation time before sound treatment calc


Reverberation time of the auditorium before sound treatment


Frequencies (HZ)















RT 60
















Length:     24.5     m

Width:       13.64     m

Height:      6     m

Volume:   2005.08    m3














Front Wall




Back Wall


Right Wall


Left Wall
















3-Reverberation time after sound treatment calc


Enosh Ceiling:


Length:     < enosh_ceiling>           m

Width:       9             m

Material:   1”       m     


Eonsh pallet:


Number: 10

Area:  37.5      m2

Material: 2”









RT 60 at different frequencies (after adding enosh materials):




Frequencies (HZ)















RT 60















4-Critical area after sound treatment  





5- calc of Standing Waves:



Frequency Length Width Height    
F1 7 12.5733137829912 28.5833333333333    
F2 14 25.1466275659824 57.1666666666667    
F3 21 37.7199413489736 85.75    
F4 28 50.2932551319648 114.333333333333    
F5 35 62.866568914956 142.916666666667    
F6 42 75.4398826979472 171.5    
F7 49 88.0131964809384 200.083333333333    
F8 56 100.58651026393 228.666666666667    
F9 63 113.159824046921 257.25    
F10 70 125.733137829912 285.833333333333    












6- the shape of the diffuser to remove standing wave 


Length: 1.5

Width:  0.3


   0.0571666666666667                  0.228666666666667                0.228666666666667

   0.0571666666666667                  0


The application of the diffuser will be in the wall panel




7-Codes & Testing :: Sound Transmission Class (STC)

Code: STC rates a partition’s or material’s ability to block airborne sound.

Enforcement: Appendix Chapter 35 of the ’88 and ’91 UBC, Appendix Chapter 12, Division II of the ’94 and ’97 UBC will be contained in the forthcoming IBC. Although not all municipalities have adopted this appendix chapter, it is still recognized as an industry standard.

General Information: The Uniform Building Code (UBC) contains requirements for sound isolation for dwelling units in Group-R occupancies (including hotels, motels, apartments, condominiums, monasteries and convents).

UBC requirements for walls: STC rating of 50 (if tested in a laboratory) or 45 (if tested in the field*).

UBC requirements for floor/ceiling assemblies: STC ratings of 50 (if tested in a laboratory) or 45 (if tested in the field*).

* The field test evaluates the dwelling’s actual construction and includes all sound paths.


  • Sound Transmission Class rates a partition’s resistance to airborne sound transfer at the speech frequencies (125-4000 Hz). The higher the number, the better the isolation.

STC Strength: Classifies an assembly’s resistance to airborne sound transmission in a single number.

STC Weakness: This rating only assesses isolation in the speech frequencies and provides no evaluation of the barrier’s ability to block low frequency noise, such as the bass in music or the noise of some mechanical equipment.

Recommended Isolation Level
An assembly rated at STC 50 will satisfy the building code requirement, however, residents could still be subject to awareness, if not understanding, of loud speech. It is typically argued that luxury accommodations require a more stringent design goal (as much as 10dB better – STC 60). Regardless of what STC is selected, all air-gaps and penetrations must be carefully controlled and sealed. Even a small air-gap can degrade the isolation integrity of an assembly.



8-Codes & Testing :: Reverberation Time (RT60)

Test: RT60 measures the reverberance within a room.

Related Code: RT60 is soon to be adopted under ADA for classroom acoustic criteria.

General Information: Reverberation Time is the time required, in seconds, for the average sound pressure level in a room to decrease 60 decibels after a source stops generating sound. This test is standard on certain projects, such as “THX” movie theaters and various government buildings. Normally, in the design phase, you must demonstrate (through calculations) that a space will achieve the stipulated reverberation time. Often times, measurements are required to verify results.

Strength: Because RT60 is void of variables, unlike many other tests, it is straightforward and clear-cut.

Weakness: RT60 does not account for problematic and potentially annoying reflections. Often times, there is still a need for expert analysis.




9-Codes & Testing :: Noise Criteria (NC)

Code: This industry standard (also an ANSI standard) usually pertains to HVAC or mechanical noise impact.

Enforcement: This standard is often required for certain certifications (such as government medical facilities) or included in client specifications/standards (for example, some companies have NC standards that their buildings must meet).

General Information: An NC level is a standard that describes the relative loudness of a space, examining a range of frequencies (rather than simply recording the decibel level). This level illustrates the extent to which noise interferes with speech intelligibility. NC should be considered for any project where excessive noise would be irritating to the users, especially where speech intelligibility is important. There are a few spaces where speech intelligibility is absolutely crucial, including:

For some areas, such as machine shops or kitchens, it is not essential to maintain a particularly low NC level.

NC Level Strength: It is important for design professionals to specify NC ratings to protect their designs (within reason – specifying an acceptable NC level does not have to be a burden on the budget). Doing so speaks to your reputation as a responsible architect or designer and limits your liability.

NC Level Weakness: NC does not account for sound at very low frequencies. In spite of numerous efforts to establish a widely accepted, useful, single-number rating method for evaluating noise in a structure, a variety of techniques exist today. The vast majority of acoustic professionals use the NC standard, but it is still important to be aware of the other acceptable methods that do account for low frequency levels, including (but not limited to):

  • Room Criteria (RC) measures background sound in a building over the frequency range 16 Hz to 4000 Hz. This rating system requires two steps: determining the mid-frequency average level and determining the perceived balance between high and low frequency sound. To view the recommended ANSI levels for room criteria for various activity areas.
  • Balanced Noise Criteria (NCB) is based on the ANSI threshold of audibility for pure-tones and is defined as the range of audibility for continuous sound in a specified field from 16 Hz to 8000 Hz.



1-   Wall damping system theckinss 14 to 15cm

2- Wall treatment using acoustics panel NRC UP to 1.15

3- Ceiling treatment using acoustics panel NRC up to .80

4-  Floor treatment using bright surface to balance the RT of room

5- Standing wave are detect by 10 frequency

6- Must diffuse all wall by two thickness panel

7- theater seats must be foam injection not


this stady according to American standard

ASTM E90-2009″ Standard Test Method for Laboratory Measurement

Of Airborne Sound Transmission loss of Building Partitions”

ASTM E4I3-2004″ Classification for Rating Sound insulation”

Dr. Ibrahim elnoshokaty

Member acoustical society of America

Member acoustical society of Egypt










studio sound test

تم هذا الاختبار لعزل الصوت في احدي استوديوهات ميلودي بهد انتهاءنا من تجهيزه

Speech intelligibility in noise

Speech can be modified to promote intelligibility in noise, but the potential benefits for non-native listeners are difficult to predict due to the additional presence of distortion introduced by speech alteration. The current study compared native and non-native listeners’ keyword scores for simple sentences, unmodified and with six forms of modification. Both groups showed similar patterns of intelligibility change across conditions, with the native cohort benefiting slightly more in stationary noise. This outcome suggests that the change in masked audibility rather than distortion is the dominant factor governing listeners’ responses to speech modification.
Key Topics

Random noiseMaterials analysisSequence analysisSpeech analysisElectric measurements
1. Introduction GO TO SECTION…

Listeners are frequently required to understand recorded or synthetic speech output under less-than-ideal conditions. One approach to maintaining intelligibility in such environments is to modify the clean speech prior to output (e.g., Skowronski and Harris, 2006 ; Taal et al., 2013 ). Large-scale evaluations have demonstrated gains equivalent to a reduction in speech level of more than 5 dB for participants listening in their first language, at least for English ( Cooke et al., 2013 ). It is of interest to ask whether non-native listeners (NNLs) benefit from speech modifications to the same extent as native listeners (NLs). While the effect of noise on speech perception in NNLs has been researched extensively (see review in García Lecumberri et al., 2010 ), most studies to date have employed unaltered forms of speech. Far less is known about the impact of modified speech on NNLs.
Many speech modification algorithms aim to improve the masked audibility of speech. For instance, Taal et al. (2013) sought the optimal linear filter maximizing an approximation to the Speech Intelligibility Index ( ANSI, 1997 ). If masking release is the main effect of speech modification, previous studies of the effect of noise on NNLs (e.g., Cutler et al., 2004 ) lead to the prediction that this group of listeners will benefit by a similar amount to NLs for speech material with a predictable syntactic structure and limited lexicon. However, a known side-effect of modification is some degree of distortion, and it is also possible that NLs are able to use their richer experience with the phonology of the target language to extract a larger benefit than NNLs.
Earlier studies with altered speech styles provide a mixed picture of their effects on NNLs. Hazan and Simpson (2000) examined the degree of benefit produced by selective amplification of perceptually-salient regions of vowel-consonant-vowel material. Two groups of NNLs with different first languages showed similar intelligibility gains over unprocessed speech as a NL cohort. However, a study using synthetic speech ( Reynolds et al., 1996 ) demonstrated that NNLs suffer larger deficits than NLs for this form of non-standard speech material. Likewise, Lombard speech has been shown to be somewhat less beneficial for NNLs ( Cooke and García Lecumberri, 2012 ).
The current study measured the effect of speech modification on NNLs using a range of algorithms tested in Tang and Cooke (2011) . The six modification techniques tested differ both in their effect on intelligibility and in their degree of disruption to speech quality as predicted by an objective measure. NNLs identified keywords in simple unmodified and modified English sentences presented in stationary and fluctuating maskers. Results are compared with those from a NL cohort of 24 British English participants tested in Tang and Cooke (2011) .

2. Methods GO TO SECTION…

2.1 Listeners
A group of 71 young adult listeners participated in the experiment. All were native monolinguals in Spanish or bilingual in Spanish and Basque, and all were in their second year of studies for the degree of English Philology at the University of the Basque Country, Spain. Of these, six failed to complete some of the conditions and were excluded from subsequent analysis.
2.2 Speech and noise material
Sentences were drawn from the GRID Corpus ( Cooke et al., 2006 ) and consist of 6 word sequences with spoken letter and digit keywords in the fourth and fifth positions, e.g., “lay red at K 4 now,” spoken by 1 of 34 male or female talkers. These so-called “matrix” sentences were chosen in this preliminary study to avoid the involvement of higher-level knowledge which is known to produce larger NL benefits in noise ( García Lecumberri et al., 2010 ). Sentences were drawn at random from the corpus and presented in stationary (speech shaped noise; SSN) or fluctuating (speech modulated noise; SMN) maskers. The SSN sample approximated the long-term spectrum of the unmodified speech corpus. SMN was derived by modulating the SSN signal with the short-term temporal envelope of randomly-concatenated sequences of utterance from the corpus.
2.3 Processing conditions
Speech material was processed by six different modification techniques described in Tang and Cooke (2011) : “SegSNR,” “ChanSNR,” and “LocalSNR” equalized the signal-to-noise ratio (SNR) in each frame, frequency channel, and time-frequency location, respectively; “SelectBoost” amplified masked channels in the frequency range 1800–7500 Hz; “Pausing” introduced a 300 ms pause preceding a word boundary in such a way as to avoid the most intense noise epoch, while “Combined” consisted of Pausing and SelectBoost in sequence. Modifications were applied to clean speech prior to mixing with noise.
The overall root-mean-square (rms) energy was equalized following the modification, and since the Pausing and Combined techniques introduced pauses, the duration of the remaining speech sections was linearly compressed by an equivalent amount.
Figure 1 shows waveforms and spectrograms for unprocessed and modified speech for an example utterance. It is evident that the modification techniques differ in the degree of alteration to the signal and its spectro-temporal characteristics. For example, while ChanSNR is equivalent to a constant spectral filter and has little effect on speech quality, both SegSNR and LocalSNR impose rapid variations across time frames and result in significant audible distortions. Table 1 provides an estimate of distortion using the objective speech quality measure PESQ ( Rix et al., 2001 ). For the modifications tested here, values cover the entire PESQ range, from 1 (poor quality) to 4.5 (undistorted speech) relative to the reference unmodified speech signal.

Click to view
Fig. 1.
Original and modified waveforms and spectrograms for the utterance “Set red by O 2 soon.”

Table 1.
Table 1.

Click to view
Table 1.
Mean PESQ values across 50 sentences in each modified speech condition. Standard deviations are given in parentheses.

2.4 Procedure
In Tang and Cooke (2011) , NLs were tested at SNRs of −6 and −9 dB, apart from the modification method LocalSNR, which was mixed at SNRs of 0 and 3 dB due to reduced intelligibility at lower SNRs. In the current study, NNLs were tested at −6 and 0 dB for all conditions apart from LocalSNR, which was presented at 3 and 6 dB. Results are given here for the SNRs that the two listener groups had in common, namely, −6 dB (3 dB for LocalSNR). SNRs were computed over the region where the speech is present.
Listeners heard speech in noise in 28 conditions made up of all combinations of the 2 masker types, 2 SNRs, and 7 sentence processing conditions (i.e., 6 modifications plus unmodified speech). Sentences were blocked by condition: within each block the SNR, masker, and modification was constant. Each block consisted of 50 utterances. To avoid sentence subset effects, 28 sets of 50 sentences were generated for each condition (i.e., 784 sets in total) and listeners were assigned to sentence sets using a balanced design which ensured that no listener heard the same sentence more than once, and that each listener heard the same number of sentences in each of the 28 conditions. Condition order was also balanced across listeners, and the order of stimulus presentation within each condition randomized.
The experiment took place in a quiet laboratory. Stimuli were delivered under computer control via Plantronics Audio-90 headphones (Plantronics, Santa Cruz, CA). Participants entered letter and number keywords using a computer keyboard. Listeners were familiarized with the task via a short practice session and undertook the main experiment, which required approximately 90 min to complete, over 2 sessions separated by a break.

3. Results GO TO SECTION…

In the unmodified speech condition, NLs (from Tang and Cooke, 2011 ) identified 63.8% of keywords correctly in stationary noise and 81.1% in fluctuating noise, while NNLs obtained scores of 52.8% and 67.7%, respectively, representing NL benefits of 11.0 and 13.4 percentage points. Figure 2 plots mean percentage keywords correct for the two listener groups for all conditions. It is evident that NL and NNL scores are highly-correlated [ r = 0.97, p < 0.001] with the best linear fit having a slope close to unity and showing a mean NNL deficit of just over 12 percentage points.

Click to view
Fig. 2.
Mean keyword correct scores for NLs and NNLs in stationary noise (filled symbols) and fluctuating noise (unfilled symbols). Points have been shifted randomly by up to ±0.5 percentage points to avoid overlap. Native data come from Tang and Cooke (2011) .

The upper panel of Fig. 3 presents changes in keyword scores, expressed in percentage points, for the six processed speech conditions for both listener groups relative to their respective unmodified speech baselines. Overall, NLs and NNLs show a very similar pattern of gain for each masker. The additional NL gain in stationary noise averaged 5.1 percentage points across modifications and 0.8 percentage points in fluctuating noise. Separate two-factor (modification by listener group) repeated-measures analyses of variance were computed for each masker type. For the SSN masker, gains differ across modifications [ F(5, 435) = 363, p < 0.001, η 2 = 0.66] and listener group [ F(1, 87) = 10.2, p < 0.01, η 2 = 0.06] but the interaction between these factors is not statistically-significant [ p = 0.22]. For the SMN masker, the effect of modification is again significant [ F(5, 435) = 250, p < 0.001, η 2 = 0.62]. However, the two listener groups have equivalent overall gains [ p = 0.48]. The modification by listener group interaction is significant [ F(5, 435) = 3.61, p < 0.01, η 2 = 0.023]. Post hoc comparisons based on a Fisher’s Least Significant Difference value of 2.6 percentage points indicate that the interaction is due to different gains for the LocalSNR modification technique.

Click to view
Fig. 3.
NL and NNL keyword score gains in percentage points (pps; upper) and changes in RTs (lower) over unmodified speech in SSN (left) and SMN (right). Error bars represent ±1 standard error. Native data come from Tang and Cooke (2011) .

Figure 3 (lower) plots changes in response times (RTs) relative to unmodified speech. The median RT (measured from stimulus onset) per listener in each condition was used to avoid the influence of very long or short RTs. In the baseline unmodified speech condition NLs required 2.8 and 2.7 s for the SSN and SMN maskers, while NNLs responded in 3.4 and 3.1 s, respectively. For both maskers there is a significant interaction between nativeness and modification technique [SSN: F(5, 435) = 2.8, p < 0.05, η 2 = 0.01; SMN: F(5, 435) = 4.9, p < 0.001, η 2 = 0.03]. The pattern of RT change is complex, and varies both with modification technique and masker type. For NNLs, most of the RT changes across modification methods represent an amplified version of those seen for NLs.

4. Discussion GO TO SECTION…

In common with most previous studies which compared speech-in-noise intelligibility of NL and NNLs (see review in García Lecumberri et al., 2010 ), the non-native group identified fewer keywords correctly in noise than the native cohort. However, both listener groups showed a strikingly similar pattern of intelligibility changes when confronted by modified speech relative to an unmodified speech baseline. This finding is in line with Hazan and Simpson (2000) , whose two NNL groups benefited from speech enhancements to a similar degree to that of a native control group. Unlike Hazan and Simpson (2000) , whose modifications involved selective amplification of regions of phonetic importance, the algorithms tested in the current study were designed to promote masked audibility without regard for speech content, since a wider range of modification strategies are available if the need to identify salient phonetic information is removed. The present study supports the notion that differences in masked audibility across modification techniques affect NLs and NNLs identically. We found little evidence for the hypothesis that NLs are better able to handle distortions to the expected speech pattern resulting from speech modification. While NLs did benefit more (or suffer less) from modifications in the stationary masker, this additional NL benefit of around 5 percentage points was similar for all modifications regardless of the amount of objective distortion each one introduced. For the modulated masker NNLs were more adversely affected in the LocalSNR condition, where it might be argued that distortion played some part. However, the two conditions containing pauses had lower objective speech quality but exhibited no NNL disadvantage. One possibility is that spectro-temporal and pause-based modifications have differential effects on NLs and NNLs.
As expected, listeners responded more rapidly in conditions which produced high intelligibility. For instance, RTs decreased in stationary noise for the LocalSNR and SelectBoost modifications. Here, though, non-native RTs showed larger decreases over their baseline. This may be a ceiling effect: it is possible that at around 2.6 s for SelectBoost NLs were already responding as rapidly as possible. In spite of their larger decrease in RT, NNLs in the same condition remained slower at around 2.85 s. It is less clear why RTs for NNLs were more adversely affected than those for NLs in conditions which exhibited intelligibility reductions in the presence of fluctuating noise. The largest differential effect is seen for SegSNR. This modification redistributes energy across time frames to ensure that each has an equivalent SNR. For fluctuating maskers this has the side-effect of coupling speech modulations to those of the masker. The possibility that NNLs require more processing resources to perform speech separation under these conditions merits further study.
Finally, we note that the aim of this initial study was to establish the effect of masked audibility and distortion in sentences where the value of higher-level linguistic information is minimized. It remains to be seen whether modifications to more complex speech material interact with a listener’s native language status.

5. Conclusions GO TO SECTION…

Changes in intelligibility resulting from modified speech show a similar pattern for NL and NNLs despite differences in the degree of objective speech distortion across modifications. This outcome encourages the deployment of algorithmically-altered forms of speech in applications such as public transport interchanges where they promise to benefit listeners regardless of whether they are listening in their native language.

Acknowledgments GO TO SECTION…

This work has received funding from the European Union 7th Framework Programme under Grant Agreement No. FP7-PEOPLE-2011-290000 (INSPIRE) and the Basque Government under grant Language and Speech (IT311-10).

Optimization WordPress Plugins & Solutions by W3 EDGE