FFT Phase Vocoder crackling

https://community.native-instruments.com/discussion/comment/3291#Comment_3291

hi! thank you :D

https://community.native-instruments.com/discussion/comment/3393#Comment_3393

that's right, modded the ezfft till it became an almost new thing (called it vfft and made a pitchshifter with it), the 3 things i wanted to change is: variable blocksize, variable overlap and reduced latency - it's all done on event-(instead if audio) base. i'm glad not to really know how reaktor processes this stuff...:D, what i assume is that it can destabilize it

first i indeed used voices for the streams but then it all went into a single event-based machine

https://community.native-instruments.com/discussion/comment/3457#Comment_3457

the time-scaling stuff can be imagined as something working in parallel independently on all fft-bins, on a single sinewave like oscillator so to say, on a single bin it first calculates a frequency by taking the phase difference of 2 blocks and then run the new oscillators with this frequency - in this unaltered form it produces lots of timesmearing, but this can sound very interesting

https://community.native-instruments.com/discussion/comment/3457#Comment_3457

i'd guess you'd have already come across this blog? http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/

it seems four is the magic number to minimize phase wrapping errors, though in practice i've more often than not been able to get by on 2. what were you using to get 4 channels, the robin davies macros?

https://community.native-instruments.com/discussion/comment/3488#Comment_3488

i'm glad not to really know how reaktor processes this stuff...:D, what i assume is that it can destabilize it

funny you say that, i've built stuff with your vfft architecture that fed into event tables (so presumabely not threadsafe lol) and then across multi instruments with different voice counts and then ultimately even fed <i>back</i> into the original core cell and never really run into problems.

first i indeed used voices for the streams

anyway, i ended up pretty much fully groking what the event based fft was doing (at least in principle), but the multivoice thing still interests me. it seems without voice management in core you would have to export the streams to primary to deal with that?

https://community.native-instruments.com/discussion/comment/4157#Comment_4157

good to know it's stable, my computer is an old haswell i5 (from 2014), maybe this play into it

---

EzFFT multioverlap filter.ens

The event stuff is more or less simulating the voices per copying data from a stream into an array via iterated events, took very long to get this running.

Found the per voice thing, this was the step before going the event based way (it's ~ a year older than the vfft). I think there were some issues with the frequency calculation or something like this so there is no pitch shifter (where the voices 'interact'), but I'm not sure. It straight forward uses one voice for each overlapping stream.

Were wrong, there is a timestretcher with (interacting) voices

Ez FFT mod per voice timestretch.ens

https://community.native-instruments.com/discussion/comment/4155#Comment_4155

That's a good link thanks.

Yes, I was using Robins fft macros. the polar/depolar come from ezfft, and I had to bug fix the window macros.

thing on the window functions on fft with ifft applications, is that in contrast to analysis only stuff they get applied twice, so in total the square gets applied on a neutral (even bypassed) fft, i found nothing to read about this

the thing is, in this case on overlap 2 the Hann and Hamming window don't sum up to 1 anymore like they do in their original non-squared form, the result is a tremolo / ringmod-like artifact it needs a minimum overlap of i think 3 but for sure 4, on windowfunctions with higher terms (Blackman-Harris) it even gets worse

The ezfft uses some hacked window to sum up to 1, i think it's square root of Hann (sqrt(.5*(1-cos (2*pi*t))), t=0..1)

https://community.native-instruments.com/discussion/comment/4256#Comment_4256

I tried the sqrt(hann), seems to make a slight improvement. More noticeable on some sources. Mostly there is an obvious difference. It's just not always an obvious improvement.

Definitely makes sense though!

https://community.native-instruments.com/discussion/comment/4256#Comment_4256

Just to clarify what you were explaining

With two parallel channels, you need the sqrt(hann) to avoid the warbling modulation effect:

https://www.desmos.com/calculator/umt7iw9ntq

With four channels, you can use straight hann:

https://www.desmos.com/calculator/syclwwcaqg

With two channels, using the square root window makes a significant difference. With four channels, once the levels are matched, the difference is not easily perceptible, although the two versions do not null.

https://community.native-instruments.com/discussion/comment/4471#Comment_4471

Yes, this is exactly what I mean, cool graphics.

Some other thing is: at 2x overlap the sqrt version of Hann itself does not sum to 1 as drawback, only the square, with 4xoverlap Hann everything is fine in both cases. (I'm not sure if it really makes a difference, but it could if changes are applied to the signal)

https://www.desmos.com/calculator/vxnsaaizp7

And another thing on the sqrt is, the side bands are completely different

haha great stuff guys. you know, its interesting, for analyis/partial tracking stuff i usually use a gaussian window but some stuff requires flattening it out until its basically a square window. it seems to have a lowpassing effect, as in the partials will remain stable longer... sort of almost the opposite of what i'd expect

https://community.native-instruments.com/discussion/comment/4673#Comment_4673

good choice, gaussian is near perfect for analysis and could be for resynth, summing up to 1 could be a bit tricky

https://community.native-instruments.com/discussion/comment/4715#Comment_4715

what exactly is meant by summing up to 1? i could wager a guess but i feel it would probably be wrong... (maybe unity gain between the windowed section and the raw chunk of audio... ?🤔)

also, what would be the disadvantage of not having this property?

https://community.native-instruments.com/discussion/comment/5077#Comment_5077

when all the various channels of repeated windows are summed, you get a nice level straight line at y=1

Thats why if you are using two channels for overlap-add, and two windows, one at the input and one at the output, you need the square root of Hann, so that when the two windows are multiplied, you get Hann, which then sums to 1. otherwise, it sums to an offset sinusoid, and you get a warbling artefact.