Spec 4.6 Zag 2.1 Discussion

Zeerround · **Joined:** Sun Aug 30, 2009 5:01 am **Posts:** 723

This is the discussion thread for SPEC 4.6 ZAG 2.1

Download the package here: viewtopic.php?f=8&t=237

Andrei · **Joined:** Fri Apr 02, 2010 10:57 pm **Posts:** 3

Thank you very much for making the new version available! I wonder if someone in the know could answer several questions regarding the new version.

1. Mag/Freq Ratio, Freq. Mode. ArcTan now has a new set of controls:

The manual contains a snapshot of these controls, but I have not found any details on what specifically these contols do, in which case they should be used, how they affect the sound conversion, and what setting is recommended as the starting point. Could you please provide some guidance?

2. Image width controls.
These always have been very confusing to me. I could not figure out how center width, front width, and rear width are correlated to each other and to the image width. The manual eludes to it so concisely that it was hard to me to figure out how the math and algorithm work. I think I finally figured it out after I started making sketches for posting a question in this forum. Let me explain how I understand it, and please tell me if I got it right.

We start with the stereo which is assumed to cover a 90 degree arc:

Next, we wrap it around the listener by stretching the soundfield from 90 to 360 degrees. At this stage, the stretching is homogeneous: each anglular position in Fig. 1 is simply mutiplied by 4.

Then, we move ArcTan sliders to assign 5 areas to the 360 degree sound field: Center, two equal Front, and two equal Rear. The math is

Center width + 2* FrontWidth + 2*RearWidth = 360 degrees.

Finally, we stretch the sound field as if it was an elastic rubber band: the center area (pink in the sketch) is squished into the center speaker, the two front areas (green in the sketch) are mapped onto the arcs between the center speaker and left and right front speakers, respectively, and finally the rear areas (blue in the sketch) are stretched over the remaining arcs in the rear. If image width is less than 360 degrees, there is a gap left in the rears, as shown in the sketch.

The width of the Rear arcs is calculated from the Total Image Width and does not depend on the position of the rear speakers; The width of the front arcs is determined by the front speaker positions.

It would be great if you could advise if my thoughts are in line with the algorithm used in ArcTan.

3. Slice algorithm and controls
a) Can you please suggest a source where I can find a good description of the SLICE algorithm? All I've seen so far is so generic that I don't really understand how it works.

b) It is my understanding that SLICE is applied to the rear channels if used with ArcTan. Does SLICE use only the data from the rear channel arcs (blue in my sketch above), thus modifying the ArcTan output, or does it pull data from the whole stereo signal and overlaps it with / replaces ArcTan output for the rears?

c) Does the SPEC checkbox "Wrap Rears" override the Total Image Width setting by changing it to 360 degrees? Does the position of the ArcTan Total Image Width slider make any difference at all if this box is checked?

d) How does the "Wrap Humidity" slider work mathematically? What parameter does it change?

Thank you very much!

Zeerround · **Joined:** Sun Aug 30, 2009 5:01 am **Posts:** 723

Andrei · **Joined:** Fri Apr 02, 2010 10:57 pm **Posts:** 3

Thank you very much for your very quick and detailed comments! I find that it is very helpful (when working in a system with multiple spam controls and hence with multiple digrees of freedom of adjustmens) not only to go by the feel during live monitoring, but also understand what exactly each slider does to the sound and why it does what it does.

I studied the reference to the center cut algorithm which you kindly provided, and also found some interesting history and background in the manual to SPEC v.4. The hostory of SPEC described in v.4 manual indicated that you first implemented center cut algorithm in Bidule, then expanded it to SLICE, and only later developed ArcTan. While ArcTan is in many ways superior to SLICE, you found that SLICE gives somewhat different output and included the option of blending into the ArcTan output for the rears either the original stereo source or SLICE rears, which based on the manual and your reply appears to work like this:

I am still a little fuzzy on how two-stage SLICE conversion works. The diagram below shows my best guess: the first stage likely uses left and right channels of the original stereo to separate center channel, while the second stage uses pairs "right & center" and "center & left" as inputs:

The next question becomes how SLICE output is different from ArcTan output...

I carefully read description of the Center Cut algorithm. I am still confused why the author refers to output of Fast Hartley Transform as a 2D vector. Normally, FHT should transform a one-dimensional data array in time domain to a one dimensional array of data in frequency domain. It appears that the author of Center Cut may have used cos and sin transforms in parallel to get information on both phase and magnitude, thus using essentially FFT but without complex numbers.

His computation of the center channel uses scalar products of vectors. He seeks a solution of the equation

(L - alpha * C) [dot] (R - alpha * C) =0

Where [dot] is the scalar vector product operator (defined as the length of one vector times the projection of the second one along the first), L and R are phase/magnitude vectors of left and right channels in the frequency domain, C is the vector sum of normalized vectors L and R (hence it determines the direction in the phase/magnitude space in which there are significant components in both L and R channels exist), and alpha is the fitting coefficient.

This equation may have three solutions:

1) L-alpha*C = 0
2) R-alpha*C = 0
3) alpha chosen in such a way that vector L-apha*C is perpendicular to R-alpha*C and scalar vector product is zero.

With 2D vector inputs for L and R, solutions (1) and (2) are very unlikely. They are possible only when vectors L and R point into the same direction, hence C is parallel to both L and R. In nearly all practical cases, L and R are not parallel and L - alpha*C can't become zero. Therefore, one has to solve a quadratic equation to find alpha for the solution (3).

To the best of my understanding, Plogue Bidule does not enable one to use full FFT with complex output; only amplitudes are available. I found a discussion on Plogue Bidule development forum which I believe was originated by you and which discusses possible solutions to this problem:

http://www.plogue.com/phpBB3/viewtopic. ... SDK#p22051

It seems that your approach was to solve quadratic equation from the Center Cut algorithm with the difference that L and R are no longer two-dimensional vectors. In this case, C becomes just a scalar C=2, and quadratic equation has a fairly simple solution outlined at the bottom of the thread on Bidule forum (oddly, my calculation stubbornly give me "+4LR" under the square root at the end of the formula instead of "-4LR", but this is not the point of my question).

However, I do not quite understand why one would even attempt to solve this equation in the one-dimensional case. Unless I am confused about outputs of spectral bidules, L and R are no longer 2D vectors, and they CAN NOT have orthogonal components. They are one dimensional vectors which are always parallel. Vector product becomes just a product of two scalars. From the 3 possible solutions of the Center Cut equation listed above, only solutions 1 and 2 become meaningful, and only they are feasible:

1) L-alpha*C = 0
2) R-alpha*C = 0

What does this mean conceptually? We starts with L and R channels in stereo. We do conversion to frequency domain. We compare amplitudes of left and right channels for each frequency component. If for a given frequency L>R, then the sound is located to the left of the "future center channel". Hence, we remove the signal from the right channel completely (thus satisfying the condition R-alpha*C=0) and move it all to the center channel. Since in our one-dimensional case C is a scalar and C=2, the solution for alpha is just alpha = R/2.

With that, SLICE appears to boil down to essentially panning to 5 channels which ZPAN and ArcTan use, with the difference that speaker positions are not variable but fixed at 360 / 4 = 90 degrees (4 because 5 speakers are separated by 4 gaps between them). My thinking is that SPEC output can be reproduced with ArcTan if speakers are positioned in a non-standard (for a surround sound system) way. I am not sure, though, if SLICE can do such a careful job of panning with constant power equations as ArcTan does.

It appears that due to extreme setting of speakers, SLICE is bound to emphasize in its rear channels sounds located on extreme left and extreme right in the stereo sound field - to a greater extend than ArcTan would do with its usual settings. Blending of SLICE component into ArcTan output may either make rears better defined. In the extreme case, too much SLICE blended in (especially for stereos with strong stereo separation) may make rears excessively dominant. This matches what I am hearing when experimenting with SLICE.

Does this make sense at all? Is there something that I am missing in my attempt to understand the background of how this wonderful algorithm works?

Thanks - and Happy New Year !!

Zeerround · **Joined:** Sun Aug 30, 2009 5:01 am **Posts:** 723

Well let's see.

Your diagrams are essentially correct.

Yes, the "fft" in plogue is really a "phase vocoder", where you have magnitude and frequency vs. real and imaginary or magnitude and phase.

Yes Slice only uses the 1 D magnitudes. Why? because it seems to work, vs. any rigorous mathematical reasoning. SPEC has been developed by a community effort, and often contributions are made that just sound good, vs. any revelation based on string theory or what ever (next "method" based on m-branes anyone) ;-)

I can think of lots of similar acoustic phenomena. A "perfectly" tuned scale sounds "wrong" compared to "well tempered". Music played with small timing errors or "human" vs machine quantizations sounds better. I have personally tortured my self and others with long bouts of noise at high volumes in listening environments, to get "flat" response, only to decide the result sounds bad and tune by ear while listening to a familiar piece of music.

I will admit to not being particularly great at algebra, and needed help from the community both on the slice quadratic and on the "rubber band stretching" parts of ArcTan. And we do find errors and make corrections as we go along. I'll take a look at quadratic your solution.

I'm always on the lookout for source code that could implement "real" FFTs, phase vocoders, and wavelts, in real time, and/or with less DSP load that what we have currently in Plogue. This includes exploring other modular audio programs.

So far I haven't found anything that can do 2 time to frequency transforms, and 5 inverse transforms, in real time with the needed resolution/quality.

Zeerround · **Joined:** Sun Aug 30, 2009 5:01 am **Posts:** 723

RE: panning.

ZPan and ArcTan use constant power panning. ZPan in the time domain, ArcTan in the spectral domain. I think there are references to the thesis I used for the surround constant power panning formulas in some of the older guides. If not, and you're interested, I can post it here.

ZPan and ArcTan (now tied to ZPan's) have speaker placement controls, so the panners know where the speakers are located. Regardless of
speaker placement, any input signal can be panned around the entire 360 listening space.

ZPan is used to widen or narrow the output of SLICE, and also to "re-wire" the output of SPEC in the case where you want a vocal or outer sound that the stereo producer has placed on the extreme edges to be in the center channel.

I've also made a layout for converting Quad music to 5.1, which uses ZPan to reproduce the quad speaker placement on ITU positioned speaker systems.

Andrei · **Joined:** Fri Apr 02, 2010 10:57 pm **Posts:** 3

I agree, as long as there is no firmly proven and 100% accurate explanation how humans hear directional sound, formulas and algebraic calculations can do only as much as provide directions for experiments and give opportunities for testing models. I think this is what you are doing. The goal is to make the result sound good - which SPEC certainly delivers!

The purpose of my post was not to discuss whether or not the math is perfectly correct but rather to check if my efforts in understanding of what SPEC does have brought me conceptually close to the algorithm that works behind the scenes. I am glad that you confirmed that this is the case.

Thanks again for taking time to answer my questions. And - thank you so much for spending countless hours on development of the software and for sharing it as a freeware with the community!

Zeerround · **Joined:** Sun Aug 30, 2009 5:01 am **Posts:** 723

Just FYI we quickly tried the "+" sign in the quadratic today. Didn't sound good (3 testers, not including me, so far).

I'm OK with SLICE being a happy accident. I'm much more invested in ArcTan, that being all my own idea (with implementation help from others).

I may try the 2D implementation of Center Cut on another modular audio platform that has "real" FFTs. There's also a commercial product, based on the center cut algorithm, here: http://www.penteosurround.com/

We think what we have sounds better though, but they have the requirement that you can recover the stereo via mixdown and we don't.

Marinaoqh · **Posted:** Sun Apr 28, 2024 11:53 am

Where is moderator??
I'ts important.
Thank.

SurroundByUs.com

Forum rules

Spec 4.6 Zag 2.1 Discussion

Who is online