FFT Phase Vocoder crackling

Options
2

Answers

  • Tr97
    Tr97 Member Posts: 16 Member
    Options


    hi! thank you :D

    that's right, modded the ezfft till it became an almost new thing (called it vfft and made a pitchshifter with it), the 3 things i wanted to change is: variable blocksize, variable overlap and reduced latency - it's all done on event-(instead if audio) base. i'm glad not to really know how reaktor processes this stuff...:D, what i assume is that it can destabilize it

    first i indeed used voices for the streams but then it all went into a single event-based machine

    the time-scaling stuff can be imagined as something working in parallel independently on all fft-bins, on a single sinewave like oscillator so to say, on a single bin it first calculates a frequency by taking the phase difference of 2 blocks and then run the new oscillators with this frequency - in this unaltered form it produces lots of timesmearing, but this can sound very interesting

  • ANDREW221231
    ANDREW221231 Member Posts: 299 Advisor
    Options

    i'd guess you'd have already come across this blog? http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/

    it seems four is the magic number to minimize phase wrapping errors, though in practice i've more often than not been able to get by on 2. what were you using to get 4 channels, the robin davies macros?

  • ANDREW221231
    ANDREW221231 Member Posts: 299 Advisor
    Options

    i'm glad not to really know how reaktor processes this stuff...:D, what i assume is that it can destabilize it

    funny you say that, i've built stuff with your vfft architecture that fed into event tables (so presumabely not threadsafe lol) and then across multi instruments with different voice counts and then ultimately even fed <i>back</i> into the original core cell and never really run into problems.

    first i indeed used voices for the streams

    anyway, i ended up pretty much fully groking what the event based fft was doing (at least in principle), but the multivoice thing still interests me. it seems without voice management in core you would have to export the streams to primary to deal with that?

  • Tr97
    Tr97 Member Posts: 16 Member
    Options

    good to know it's stable, my computer is an old haswell i5 (from 2014), maybe this play into it

    ---

    The event stuff is more or less simulating the voices per copying data from a stream into an array via iterated events, took very long to get this running.

    Found the per voice thing, this was the step before going the event based way (it's ~ a year older than the vfft). I think there were some issues with the frequency calculation or something like this so there is no pitch shifter (where the voices 'interact'), but I'm not sure. It straight forward uses one voice for each overlapping stream.

  • Tr97
    Tr97 Member Posts: 16 Member
    Options

    Were wrong, there is a timestretcher with (interacting) voices


  • colB
    colB Member Posts: 823 Guru
    Options

    That's a good link thanks.

    Yes, I was using Robins fft macros. the polar/depolar come from ezfft, and I had to bug fix the window macros.


  • Tr97
    Tr97 Member Posts: 16 Member
    edited February 2022
    Options

    thing on the window functions on fft with ifft applications, is that in contrast to analysis only stuff they get applied twice, so in total the square gets applied on a neutral (even bypassed) fft, i found nothing to read about this

    the thing is, in this case on overlap 2 the Hann and Hamming window don't sum up to 1 anymore like they do in their original non-squared form, the result is a tremolo / ringmod-like artifact it needs a minimum overlap of i think 3 but for sure 4, on windowfunctions with higher terms (Blackman-Harris) it even gets worse

    The ezfft uses some hacked window to sum up to 1, i think it's square root of Hann (sqrt(.5*(1-cos (2*pi*t))), t=0..1)

  • colB
    colB Member Posts: 823 Guru
    Options

    I tried the sqrt(hann), seems to make a slight improvement. More noticeable on some sources. Mostly there is an obvious difference. It's just not always an obvious improvement.

    Definitely makes sense though!

  • colB
    colB Member Posts: 823 Guru
    Options

    Just to clarify what you were explaining

    With two parallel channels, you need the sqrt(hann) to avoid the warbling modulation effect:

    With four channels, you can use straight hann:

    With two channels, using the square root window makes a significant difference. With four channels, once the levels are matched, the difference is not easily perceptible, although the two versions do not null.

  • Tr97
    Tr97 Member Posts: 16 Member
    edited February 2022
    Options

    Yes, this is exactly what I mean, cool graphics.


    Some other thing is: at 2x overlap the sqrt version of Hann itself does not sum to 1 as drawback, only the square, with 4xoverlap Hann everything is fine in both cases. (I'm not sure if it really makes a difference, but it could if changes are applied to the signal)

    https://www.desmos.com/calculator/vxnsaaizp7


    And another thing on the sqrt is, the side bands are completely different

  • ANDREW221231
    ANDREW221231 Member Posts: 299 Advisor
    Options

    haha great stuff guys. you know, its interesting, for analyis/partial tracking stuff i usually use a gaussian window but some stuff requires flattening it out until its basically a square window. it seems to have a lowpassing effect, as in the partials will remain stable longer... sort of almost the opposite of what i'd expect

  • Tr97
    Tr97 Member Posts: 16 Member
    Options

    good choice, gaussian is near perfect for analysis and could be for resynth, summing up to 1 could be a bit tricky

  • ANDREW221231
    ANDREW221231 Member Posts: 299 Advisor
    Options

    what exactly is meant by summing up to 1? i could wager a guess but i feel it would probably be wrong... (maybe unity gain between the windowed section and the raw chunk of audio... ?🤔)


    also, what would be the disadvantage of not having this property?

  • colB
    colB Member Posts: 823 Guru
    Options

    when all the various channels of repeated windows are summed, you get a nice level straight line at y=1

    Thats why if you are using two channels for overlap-add, and two windows, one at the input and one at the output, you need the square root of Hann, so that when the two windows are multiplied, you get Hann, which then sums to 1. otherwise, it sums to an offset sinusoid, and you get a warbling artefact.

  • Tr97
    Tr97 Member Posts: 16 Member
    edited February 2022
    Options

    this is an attempt to visualize this: it's overlap 4 and a narrow and a wide Gaussian window, the blue line is the sum, this is what the overall signal gets multiplied with

    with overlap 2 the ripple gets stronger

    the sound is somewhere between tremolo and ringmodulation. i'll see if i can make a demo to simulate this

    but i have to say that the ripple is really low on overlap4/wide, so Gauss could work even on a fft->ifft combo

Back To Top