Basics of Perceptual Audio Encoding .

Uploaded on:
Objectives of Lab. Prologue to central standards of computerized sound
Slide 1

Essentials of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06

Slide 2

Goals of Lab Introduction to major standards of computerized sound & perceptual sound encoding Learn the nuts and bolts of psychoacoustic models utilized as a part of perceptual sound encoding. Run 2 tests investigating some major standards behind the psychoacoustic models of perceptual sound encoding.

Slide 3

Quantization Digital Audio

Slide 4

Quantization Noise is the distinction between the simple flag and the computerized portrayal, and emerges accordingly of the blunder in the quantization of the simple flag. N Bits => 2 N levels

Slide 5

Bits Levels Quantization With each expansion in the bit level, the computerized portrayal of the simple flag increments in devotion, and the quantization clamor gets to be distinctly littler.

Slide 6

Digital Audio CD Audio: 16 bit encoding 2 Channels (Stereo) 44.1 kHz examining rate 2 * 44.1 kHz * 16 bits = 1.41 Mb/s + Overhead (synchronization, mistake adjustment, and so forth.) CD Audio = 4.32 Mb/s

Slide 7

Compression High information rates, for example, CD sound ( 4.32 Mb/s), are inconsistent with web & remote applications. Sound information should some way or another be compacted to a littler size (less bits), while not influencing signal quality (limiting quantization clamor). Perceptual Audio Encoding is the encoding of sound signs, fusing psychoacoustic information of the sound-related framework, keeping in mind the end goal to diminish the measure of bits important to dependably replicate the flag. MPEG-1 Layer III (otherwise known as mp3) MPEG-2 Advanced Audio Coding (AAC)

Slide 8

MPEG = Motion Picture Experts Group MPEG is a group of encoding measures for computerized sight and sound data MPEG-1: a standard for capacity and recovery of moving pictures and sound on capacity media (e.g., CD-ROM). Layer I Layer II Layer III (otherwise known as MP3) MPEG-2: standard for advanced TV, including top notch TV (HDTV), and for tending to sight and sound applications. Propelled Audio Coding (AAC) MPEG-4: a standard for sight and sound applications, with low piece rate varying media pressure for those channels with exceptionally restricted data transmissions (e.g., remote channels). MPEG-7: a substance portrayal standard for data seek

Slide 9

Overview of Perceptual Encoding General Perceptual Audio Encoder (Painter & Spanias, 2000): Psychoacoustic investigation => covering limits Basic rule of Perceptual Audio Encoder: utilize veiling example of boost to decide minimal number of bits essential for every recurrence sub-band , in order to keep the quantization commotion from getting to be distinctly capable of being heard.

Slide 10


Slide 11

Quantization Noise

Slide 12

Sub-band Coding

Slide 13

m-1 m m+1 Sub-band Coding

Slide 14

Masking/Bit Allocation The quantity of bits used to encode every recurrence sub-band is equivalent to minimal number of bits with a quantization clamor that is underneath the base veiling edge for that sub-band.

Slide 15

Example: MPEG-1 Psychoacoustic Model I 1. Otherworldly Analysis and SPL Normalization

Slide 16

Example: MPEG-1 Psychoacoustic Model I 2. Distinguishing proof of Tonal Maskers & computation of individual concealing limits

Slide 17

Example: MPEG-1 Psychoacoustic Model I 2. Recognizable proof of Noise Maskers & estimation of individual concealing limits

Slide 18

Example: MPEG-1 Psychoacoustic Model I 4. Estimation of Global Masking Thresholds

Slide 19

Example: MPEG-1 Psychoacoustic Model I A - Some segments of the information range require SNR\'s > 20 dB B - Other segments require under 3 dB SNR C - Some high recurrence parts are covered by the flag itself D - Very high recurrence partitions fall beneath the total edge of hearing. B C D A

Slide 20

Example: MPEG-1 Psychoacoustic Model I 5. Sub-band Bit Allocation

Slide 21

Lab Experiments Exp 1 : Masking Pattern Measure total hearing edges in calm Measure outright hearing limits in nearness of narrowband commotion masker Exp 2 : Masking Threshold Measure covering edge of a 1 kHz tone within the sight of four distinct maskers: Tone Gaussian Noise Multiplied Noise Low-clamor Noise

Slide 22

Georg von Bekesy Method of Adjustment Method of Adjustment (otherwise known as Békésy following strategy) Target tone is cleared through recurrence range, and subject should a djust force of target tone so it is marginally distinguishable

Slide 23

Masker Threshold in calm Masked edge Masked Sounds Exp 1: Masking Pattern

Slide 24

Exp 2: Masking Thresholds Calculation of tonal & clamor concealing edges: Tonal & commotion maskers have diverse veiling impacts…

Slide 25

Tone masker SNR ~ 24 dB Noise masker SNR ~ 4 dB Asymmetry of Simultaneous Masking

Slide 26

Asymmetry of Simultaneous Masking Why do tones and clamors have distinctive concealing impacts? Flag = A(t) e j ω (t) + φ (t) For narrowband Gaussian commotion, e j ω (t) is roughly the same as a tone focused at a similar recurrence. Asymmetry impact is either because of the abundancy term A(t) or to the stage term φ (t), or a mix of both.

Slide 27

Asymmetry of Simultaneous Masking Measure covering impacts of "adjusted" commotions: Multiplied Noise : created by duplicating a sinusoid at 1 kHz with a low-pass Gaussian clamor. Sufficiency => Gaussian Noise Phase => Pure Tone Low Noise : Gaussian clamor with a fleeting envelope that has been smoothed. Abundancy => Pure Tone Phase => Gaussian Noise

Slide 28

Target (Quantization commotion) Masker (Desire flag) Gaussian clamor Tone Gaussian commotion Gaussian clamor Gaussian clamor Multiplied commotion Gaussian commotion Low clamor Exp 2: Masking Thresholds Measure concealing limit for four distinct sorts of masker Comparing the altered clamor edges with the tone & Gaussian commotion edges ought to demonstrate which part of the Gaussian clamor (Amplitude and additionally Phase) adds to the asymmetry impact.

Slide 29

Trial Number 1 2 3 4 5 6 7 8 9 10 11 12 Intensity 75 74 73 72 71 70 69 68 67 66 65 Threshold = normal of inversion focuses (typically 6 or 7) Method: Adaptive Procedure Y N Y N Y N

Slide 30

Lab Write-up Describe the techniques for Experiment 1 and the outcomes you acquired. Clarify how the edge comes about got identify with the veiling edges utilized as a part of perceptual sound encoding. Depict the techniques for Experiment 2 and the outcomes you acquired, highlighting the adequacy and stage qualities of the two "changed" clamors utilized. In light of your information, demonstrate which segment (sufficiency and additionally stage) adds to the asymmetry of concurrent concealing watched. LAB WRITE-UP DUE Monday, March 7, 2005

View more...