looking for explanation why multicore is problematic for realtime music apps

MvKeinen
MvKeinen Member Posts: 41 Member
edited October 2024 in Reaktor

I remember an excellent post by some NI official why multicore is problematic for realtime music apps. Probably in the old forum. Does someone remember? Or are there other resources in the net where its explained?

I know DAWs can put different tracks on different cores. But if you have a big reaktor ensemble it can't be calculated with more than one core AFAIK.

Is it the realtime aspect of the whole thing, where the sychronisation of different calculations in different cores would use too many resources?

Thanks

Comments

  • Isotoxin
    Isotoxin Member Posts: 213 Advisor

    Problem with multicore is more related to programming and not only to music production:


  • Big Gnome
    Big Gnome Member Posts: 38 Helper

    I do remember some of those discussions from a few years ago; they're lost to the old forum now, but I'll try to dig something up. For a start, there's a brief exchange where this is touched upon here-- https://web.archive.org/web/20220826042518/https://www.native-instruments.com/forum/threads/considered-harmful.262873/#post-1429921

  • colB
    colB Member Posts: 1,019 Guru
    edited September 2023


    Is it the realtime aspect of the whole thing, where the sychronisation of different calculations in different cores would use too many resources?

    In general non-Reaktor context, I think there are multiple reasons (Please note that I'm not an expert and the limited understanding I do have is very out of date :))

    e.g. for starters, to use multi core, you need to have multiple things that can be processed at the same time. For that to make sense, they cannot be dependent on each other. And ideally, they won't be using or manipulating the same resource(s) as each other.

    In a synth, you might have a chain of processes, an oscillator, an envelope, a filter, a reverb effect. It would be great to take each one of those elements an stick them on different cores... but wait, the filter needs the output of the oscillator, so would have to wait for that before it could do any processing... and the reverb needs the output of the filter... oops.

    There are compromises that can be made, but they have to be made on a per project basis. There is no one size fits all here, so its something of an art that cannot be automated successfully without severely limiting what can be done.

    e.g. you could incorporate delays into the structure, so that the filter is processing an older oscillator output, so it doesn't need to wait for the current oscillator output... that can work just fine... unless (hehe) you want some system where there are multiple chains with varying numbers of elements that are later combined, then there could be phasing issues and whatever... and then there are problems too with feedback where you definitely want to minimise the delay ideally to a unit delay (a single sample rate clock tick).

    (it's not as simple as this in real life though, because the processing isn't done on a per sample basis, it is applied to buffer chunks)

    And then as you suggested, there are synchronisation overheads. When the oscillator and filter and reverb have done there processing, there needs to be some mechanism by which they can share their inputs and outputs... They can't just bang on completely independent of each other, and they are not likely to take the same number of cpu cycles... I suppose that cpu time on the cores is managed by the Operating System, and context switching (between your oscillator, your sound card driver, the web browser running in the background, the various windows services etc...) is costly. So it's really important not to have trivially small processes running in this way where the process itself uses less cpu than the cost of context switching...

    I dunno, it's just freakishly complex. I guess modern programming techniques and Operating systems have a pretty good handle on things to make them as efficient as possible, and some audio apps definitely use these features, but only where it makes sense, like a DAW with many multiple tracks that are basically independent of each other and completely predictable to the extent that they are not (inputs and outputs).

    Some processes naturally lend themselves to concurrency, stuff like additive synthesis, or maybe certain types of reverb algorithm, maybe heavy FFT based processing... but for some of these maybe parallelism via SIMD is more appropriate?

    That's why multicore in Reaktor is so difficult. If and How multicore should be used is a per project thing, and requires some significant engineering knowledge and experience. But Reaktor is a development environment for non-programmers. How do NI incorporate multicore mechanisms into Primary or Core that can be successfully used by non-programmers, and are still effective in dramatically improving cpu efficiency? I don't think that is possible.

    Alternatively it could be 'automatic' with no knowledge required, but then it could only really be a poly voice based thing, in which case it will be mostly useless for Blocks, or a per instrument chain type of thing in which case it forces certain types of use case, and is something of a waste as that's already possible using a DAW and multiple Reaktor plugin instances.

    ...we also have to remember that this discussion exists in a context where we are still waiting on core iteration and some form of code reuse mechanism. Both of these are achievable and would each likely provide at least as much or better efficiency boost than multicore support.

  • MvKeinen
    MvKeinen Member Posts: 41 Member

    Thanks a lot! @Big Gnome and @colB

    very helpful! Got to read this twice I guess :)

  • Kubrak
    Kubrak Member Posts: 3,106 Expert

    There is yet another aspect.....

    Even if application uses just one thread.... In most cases, unless application developer cares for, OS decides which core will be used. And not only at start, OS may dynamically allocate given thread from one core to another core (to let cool down the used one... or so ...).

    And some cores are able to do more job, others a bit less. And reallocation takes some time introducing latency....

    And with introduction of e-cores by Intel and Apple. Situation becomes yet wilder. One does not play temporarily, why not to move task to e-core if it fits there? And later on, woou that needs hell of CPU power suddenly.... Move it to p-core! But it takes considerable amount of time. At least on Intel CPU (12th and 13th gen.)....

  • MvKeinen
    MvKeinen Member Posts: 41 Member

    " In most cases, unless application developer cares for, OS decides which core will be used. "

    Interesting. Does that mean, that application developers can "choose" whether the application is on a P-core or E-core? I assumed the OS does that automatically but I don't know.

    @colB

    " ...we also have to remember that this discussion exists in a context where we are still waiting on core iteration and some form of code reuse mechanism. Both of these are achievable and would each likely provide at least as much or better efficiency boost than multicore support "

    In my experience Iteration can make code reuse mechanism less important in some cases. You put that macro inside a loop and provide it with either algo or memory generated parameters. But I mostly do event driven things so far. Iteration in core would be huge. A library of all sorts of sorting, grouping and processing algos would be easy then.

  • Jean louis P
    Jean louis P Member Posts: 8 Member

    i have tried programming synth with one programming language , something simple for testing.

    And there was a possibility with the sound engine to have one thread per voice, then was thinking that it would be cool ... else that it was a mess for the sound because the voice were not synchronized, and then i have concluded that i had to use only one thread for the sound output.

    But i think that you can work like this (well not with Reaktor) : one thread for each voice, because each voice is independent, and when you need need a common work on the voice, like with effects, you synchronize the voices with one common thread , and send the datas from the voices to the sound output of the synth.


    There are some API for music that allow parallel programming , at least with C++ , including the use of graphic card GPU , but you must know what you are doing for avoiding traps of parallel programming.

    A good programmer could do a nice additive synth with parallel processing for each oscillator, for each voice, for example , with this kind of tools.

    It is possible, with the good tools , but not by everyone.

    Oh and one problem with parallel processing with multi threads, its that it is harder to debug, because of possibilities of interactions between the threads , the problems of synchronization, etc...

    A code that would works fine in monothread can become a mess with multithread.

  • Kubrak
    Kubrak Member Posts: 3,106 Expert

    @MvKeinen

    Interesting. Does that mean, that application developers can "choose" whether the application is on a P-core or E-core? I assumed the OS does that automatically but I don't know.

    I am not sure about AppleOS, but on Win one can set afinity of cores (I am not sure if for program, or each thread). And by doing so one can direct threads to certain set of cores, or even single core.

    On Win one can do it using Lasso, if application does not care. I do not have personal experience as I do not need it, but people recomend it.

  • colB
    colB Member Posts: 1,019 Guru

    In my experience Iteration can make code reuse mechanism less important in some cases. You put that macro inside a loop and provide it with either algo or memory generated parameters. But I mostly do event driven things so far. Iteration in core would be huge. A library of all sorts of sorting, grouping and processing algos would be easy then.

    Iteration is a special case of code reuse, but there are many other examples.

    e.g. Lets say a hypothetical Blocks ensemble includes 30 Blocks. Each of them has multiple control knobs. Each of those control knobs has code for parameter smoothing in their core cell, so that smoothing code might be repeated 100 times or more, and that's just the smoothing. Imagine the potential gains in terms of reducing the code size by a factor of 10 let alone 100?

    Smoothers is just an example, there are many other macros that get used many times over in a large ensemble. Stuff like cos, sin, process for ILO anti aliasing, blep macros, filters (I use lots of filters, particularly 1 pole)... 'ramp'/accumulator style code is used all over the place, mixers, selectors, wrappers, there a loads of different elements that get used a LOT. Some of these might still be more efficient inline, but many - like filters and smoothers - would definitely benefit massively from being reused functions

    ...how much more efficient would it be if it could all fit nicely into cpu cache without having to be swapped out as much. Remember that as you add more and more instances of the same block to an ensemble, the cpu usage increases exponentially (dramatically so). That could be mitigated significantly.

  • MvKeinen
    MvKeinen Member Posts: 41 Member

    yes, that's true of course. I guess I didn't understand properly. I thought code reuse would only be a convenience thing for the "coding" in order to edit multiple macros at one place via a master macro. I Wasn't aware that it might also take effect in the compiled code.

  • colB
    colB Member Posts: 1,019 Guru
    edited September 2023

    It might provide a bigger potential cpu efficiency boost than any other possible update. It wouldn't immediately 'just work', it would have no effect on old code other than (maybe) multiple instances of instruments, assuming that was even part of some implementation... who knows. I doubt we'll ever find out.

    [speculation] I suspect that it's not just the caching issue, but that some modern cpus use some fancy trickery based on hashing to streamline caching, and multiple copies of identical code can trip that up. That would make a lot of sense, because software that includes a lot of repeated identical code just doesn't exist... it would just be silly, so hashing can simplify and speed up caching, but if your software is weird and has lots of arbitrary sections that are identical to other sections, that could trip up the hashing algorithm and cause unnecessary cache misses [/speculation]

    EDIT: I'm assuming that the existing core compiler doesn't include some system that automagically finds identical code sections in an ensemble and generates a single function that is referenced by all instances of that identical code... I don't *think* it can, because individual core cells seem to be compiled separately. ...given the time restrictions on the compiler, it would be some achievement!

  • PoorFellow
    PoorFellow Moderator Posts: 5,599 mod

    Thank you for explanation. That made it dawn for me what eventual problem it were that e.g. Steinberg has with Cubase & Nuendo !

    Ref. : https://www.steinberg.net/system-requirements/

    Quote : Please Note : Processors with hybrid-architecture design, such as 12th Gen Intel® Core™ or newer, are currently not supported on Windows operating systems. Running Cubase 12 on systems with hybrid-architecture CPUs can lead to audio dropouts and reduced performance.

  • Milkman
    Milkman Member Posts: 291 Advisor
    edited September 2023

    The actual technical challenges aside, this has been going on since the original Core2Duo released in roughly 2002, opening up parallel processing on the desktop. Steinberg is the biggest, most public example and cautionary tale on how NOT to support multithreading in multimedia applications, and has been fighting against multithreaded DSP since the core2duo. For over *twenty years* now Steinberg has clung to some sort of outdated codebase that leads to less flexibility and more dependencies with OTHER environmental variables, and their primary solution for this problem? Blame it on users. Blame it on other parts of the ecosystem. Refuse to publish SUPPORTED HARDWARE LISTS - for over 20 years, despite 1000s of requests to do so. I doubt there is another vendor who has done as poorly.

    Is Steinberg the only of the major DAW/software makers that had issues with this? No of course not, but the difference is that every other brand has made great efforts to solve or workaround the issue, NI included.

    Personally, *all it took me* was 3 entirely new PCs over the course of 15 years. That's all.

    I found a single magic combination of CPU, chipset, chipset features(power management, speedstep, etc) that has ZERO trouble with multithreading inside Cubase 12, and I am a 30+ year net/sysadmin and custom builder. On that note, after all this time and after all the incredible frustration with Steineberg, I found a company that does multithreaded audio well -- Its Bitwig. My 20+ years with cubase has ended, or at least taken a back seat.

    I am utterly relieved to say... this is now all someone else's issue.

    On Bitwig 5, the issues I saw on cubase with multithreaded DSP are not present, I have total control of how my plugins are processed (5 different multithreaded sandbox modes), and misbehaving 3rd party plugins bother me about as much as a cloud in the sky. On Cubase, however, the sky falls every time a 3rd party plugin bombs, and pops/clicks in audio streams just become as normal as if your DAW was a record player, lol.

    These challenges, themselves, have evolved over time, but some brands have not.

  • MvKeinen
    MvKeinen Member Posts: 41 Member

    I heared....

    [hearsay]... that the main reason for the split up of the bitwig team from ableton was exactly for this reason. Some of them found the base code too old and wanted to modernize it from the ground up in order to handle multicore better. Some thought it was too much work and that its better to just blow up the code with more and more additions. [/hearsay]

    NI should team up with them in order to include Reaktor into the Grid.

  • Kubrak
    Kubrak Member Posts: 3,106 Expert

    It is always hard decission whether go on or rewrire code from scratch.... Mainly if we speak about millions lines of code. It is possible, but it will cost fortune and code will be buggy for years...

This discussion has been closed.
Back To Top