Wikus van der Walt
Daws and Machine Learning
Let machines do the dumb work, so the humans can be free to pursue higher goals. Rough Karlheinz Stockhausen quotation.
Let’s face it, machine learning will change the face of modern society in the years to come. Yet very few music technology companies have included deep learning and machine learning into their products. The one product that has stepped into the main light is iZotope’s Neutron 2, with its Track Assistant feature, where by it creates a smart preset depending on the given material. Another worthy mention is the online mastering service provided by LANDR. Although these are essential first generation products how will this pan out for DAW developers, with increasing pressure from the community at large to integrate machine learning.
The leaders in the field: IBM. Google, Microsoft and Amazon, are massive multinational corporations and some of their current offers are still rather basic in terms of functionality when compared to Watson.
How would small development companies that works in a niche software market be able to create useful tools that can assist during the music production process.
One main problem at the time of writing is that each developer spends years on its own creating another reverb, or this or that. Thus, each focuses internally and sharing idea and creating code together is an unlikely event, as these producers are fierce competitors. Instead the user is at the mercy of what other platforms bring out, and then hope their selected brand will soon also create this feature. The name of the game is copy cats.
Instead would it not make sense to create a collision between these developers focusing on creating machine learning tools that could be of actual use, and safe the user time and effort. One such an example is Batch Commander by Slate Media Technology, where many mundane tasks such as remain tracks can be renamed at a time. Although this does not include machine learning it does point the way forward to create sped up workflows. The question is types of task would be useful to a user. Many an engineer may oppose to the recommendation below, as they may feel that this is a creative art. This is indeed the truth, but it is also a technical craft, and if many of these and creative tasks can be taken out of the hands of the mixer, what better work can the engineer bring to the table. As many engineers, will also confess, budgets have shrunk, yet the expected quality has risen. A simple calculation problem is at hand, how can few mixing hours translate into better quality work.
What if a DAW can go analyse an entire project and name it according to the instruments within the mix. For example, when tracking, a short say 20 second recording is made, and the DAW will try to determine what the instruments are, leaving the engineer only to create number of required audio track beforehand, and then to quickly scanning to see if the analysis is accurate enough. Along with this, a preference menu can be available so that user can specify how it wants the tracks to be named. The author, for example, is very picky about naming practices and never resorts to acronyms or shortening names. The instrument recognition is just the first phase of what an analysis can provide a user. For example, grouping tasks such as all drums to a single bus, along with predefined colour coding as well as user selected pictures, if the user so desires, can all be setup automatically.
Imagine a list of all the mixing techniques that has been used by engineers the world over within a DAW environment. For example, the user can state to program that it is now in mixing mode. An analysis can be made of all the material, once complete then a dialogue box can appear, where the user select a genre, as this will still take quite some time to achieve, then typically mixing technique will appear in a drop-down menu. Let’s examine such an example, with lead vocals. For example, the user may be given 10 common techniques, after the analysis of the material and genre selection. Let’s take 2 such examples, the Haas effect and parallel saturation. The user can simply click on Haas and all the required tasks such as creating a duplicate track, pan one track hard left and the other one hard right, and then delay one track between 20 to 30 milliseconds, and so forth.
This example is not difficult to achieve, yet it does take time.
The place where it becomes more important is when it does not work. Then everything needs be undone and the engineer has to search for a new solution. Instead, working with setup, the floating window is still available and the user can now select parallel distortion instead. If this was all then it would be a sure-fire way that it will not work, instead a category based system can be used, where the user can for example select which plugin, how much distortion for example light, medium, and heavy. Additional parameters can include how bright the sound should be. It can be more complicated than this for example, rather than having a straight up parallel distortion channel, the user can select that it should be delayed with short feedback and wide stereo width. In this case, a high pass filter set with a corner frequency of 500 Hz, distortion plugin with x saturation, and a tempo synced stereo delay will be created along with stereo channel, routed correctly and ready for the engineer to be auditioned.
As these two examples above show, these are typically processes that an engineer will go through any way in the search for an improved lead vocal sound, yet all that work may simply work for a verse. Then the process needs to be repeated again for the chorus. The engineer still has full control, and plugins and settings can be changed at any time, by simply opening the plugin GUI and changing the desired parameter, as would normally happen. If for example the delay did not work then in the dialogue box the user can be more specific and select a delay that should be ducked by the unprocessed track, or in fact by any other track in the project. Another possibility is to select a parameter(s) and to randomize it. The point of all of this is the speed at which an engineer, as the whole described process above might have taken less than a minute for both actions, including listening, to be completed.
As have been shown in many other posts before, one can get a good idea of what a plugin excels at. For example, x compressor excels at taming drums but is typically not an ideal choice for a piano. Machine learning can be used to scan all the users plugins, and have a centralized data base as to what and how users use it for. The user can then rate how well the program executed the plugin setup and through millions of interactions this will lead to improved results. Because the entire mix has already been analysed, the DAW will have a good idea of how to achieve a basic balance. Why then, not have just such a function where the user can again with a number of criteria ask the program to create a basic mix, because let’s face it a basic balance is not what separates the amateur from the pro. This will give experienced practitioners a fanatic time save and will give beginner users improved end results, those that will most like not ever be mixed by a high calibre mixer anyway.
The other problem that many engineers face, is that the plethora of choices may indeed hamper the process, and may even lead to a sub optimal outcome. Even worse yet, is the amount of good music that is captured on hard drives all over the world due to artist’s dissatisfaction with the sonic qualities of their project. I am sure that everyone can confess that some of their premium plugins rarely getting an outing because it is easier to go with what you are familiar with. What if a system like this can shed new light on the plugins available to you and may just return value to the user upon its high cost. I am also sure that even a number of seasoned engineers would confess that they are not sure exactly what all the functions does within particular plugins. This would certainly increase with the number of parameters available, take Waves’ H-Reverb and Flux’s Solera, for example
How many people would really take the trouble learn plugins like this. Also, when someone uses creative settings, they can label as such and will teach the algorithm as something that is not just a matter of balance.
I think the point with complex plugins is well made above, that typically, if a user does have it they will under use the processing capabilities more often than not. How much more will this not be true when it comes to automation. Typically, one would automate between 1 to 5 parameters, but who often automates 20 parameters on a single sound source. A place where this can be very interesting is when two different settings is used in different parts of a song. A lead vocal will again be used as an example. Typically, a verse will not be as energetic, both from a performance and arrangement point of view. This restricts the engineer to simply follow suit and make mix decisions that simply reinforces this basic design. A design that has been part of Reaktor (6), by Native Instruments, for many years, will highlight the next feature. In Reaktor it is possible to gradually morph from a preset to another, though called snapshots within the program. This may lead to very interesting results that is not possible in either case.
There are several other plugins that sports a similar feature.
Listen to the following examples to hear 2 presets and then morph gradually from the one to the other:
Source A has been changed in this example, yet B has remained unaltered, listen to how different the results are in 002:
What I suggest is that a user can select a range of audio on a track and select what process or plugin should be used, a verse for example. Then highlight a second range and select another preset or chain of plugins, for the chorus for example. This of course may lead to 2 completely different sounding lead vocal parts, but if the user can then have a morph slider and get a bit of each. This will lead to interesting results as many parameters can be control via a macro controller.
Another use would be for vocal level automation. The number of man hours spent each year on this would be certainly fairly high. Instead a user can set a range of max and minimum and how aggressively the algorithm should apply the automation. Some of you may say, well Waves’ Vocal and Bass Riders can already do this. This may be true, but these plugins are not context aware. Because all the audio has already been analysis and placed in a hierarchy of elements, vocals being more important than a pad sound. Then not only can this be applied to a single vocal but it can be applied to all the vocals in a project with far greater accuracy as the algorithm can stay aware of all of the other elements within a mix.
I want to take this concept a bit further. How much time is spent on volume automation to improve the perception of a performance. Why not have a style automator. For example, in classical music it is very common to arch a phrase dynamically, starting soft, building with a crescendo up to the maximum point and then to gradually reseed to a softer volume. In fact, this is so common that a lead vocal may sound lifeless and have very little emotional connection if otherwise. However, backing vocals typically functions rather differently, functioning more like a type of voice pad, especially with oohs and aahs. These of course are extreme examples, with many styles and genres opting for an in between approach with some volume fluctuation. Thus, an engineer would want the automation that is applied to the lead vocal to be more fluent and backing vocals less so, aiming for a more static balance.
Sample Matching and Phase Coherence
I am sure that many an engineer would sigh a relieve if they do not have to inspect, and mostly change, the relative starting positions of instruments recorded with multiple microphones. How many times does one need to make sure that the kick in and out microphones are aligned to avoid a flappy sound. Would it not be fantastic if an algorithm can sample match a task like this, once it has completed its analysis. Another culprit is when augmentation drums with virtual instruments, again many a time there will be phasing and timing errors. The second part of the equation is that of phase coherence. For example, a string section is spaced over a relatively large area, thus phasing will occur naturally. The reason for mentioning this particular example is because the introduction of close microphones to a Decca tree setup can lead to all sorts of phasing and time smearing problems. This is not even including other instruments such as brass, woodwinds and percussion into the mix. How much time can be save if all the close microphones can be phased matched to the Decca tree without an engineer breaking a sweat.
Another concept that seems to become more feasible by the day is by processing audio via a remote server, with the aid of some local processing. Its first application would certainly be for mixing and mastering applications as latency is not as jarring as when monitoring an audio feed, or when programming via MIDI. Even in this realm it may become more feasible as such technologies such as NVidia’s Geforce Now, makes it possible to stream games with some native processing. Typically, if this type of application can be fast enough for gaming then it should be adequate for music recording, with processing of both effects and virtual instruments happening in real time. The main problem for now is the numbers. This would certainly be possible to send back and forth a few channels but when multi-terabyte libraries come into play, the computational power needs to rise exponentially. Given Morse’s law, this may be possible in the not too far distant.
One product of note that does not process audio on a user’s machine is ADX Trax Pro by Audionamix. It removes vocals from any stereo track with reasonable high success. Although this is not the only software that does this, it is the only available commercial software that is available, to the best of the author’s knowledge. Do keep in mind that it is a non-real-time process. This specific task is not something that is required by all engineers, in fact it is a rather niche requirement. However, it does point towards an interesting future for audio on the cloud.
How could machine learning influence processing. Typically speaking most users are rather sloppy when it comes to management of processing resources. The word commit is regularly stressed by top engineers in their interviews. Yet, due to the fact that consumer computers have become much faster, many users will simply leave plugins floating in process mode. Machine learning can tackle this problem and can auto freeze or temp bounce in place if a specific plugin, or chain of plugins’ parameters have not been altered. If done correctly, in the background, the user should not even notice and should be able to continue to alter any parameter without a glitch. The algorithm deployed for this task can then search for highest priority, in other words plugins that uses the most resources, can be freezed or temp-bounced in place first, thus creating the highest optimization. Users can also be given the option to state how long must take before the algorithm starts with optimization. It should probably also have a lunch mode, where a user go to lunch but does not want all of the tracks to be rendered and have to wait for everything to be converted back into real-time mode. This type of technology means that artists with even modest setups can achieve high track count along with high plugin count. One DAW of note that does pre-processing very fast and efficient is Digital Performer by MOTU, running more than a 150 instances of Kontakt 5 on a single Mac Pro in DP9, here is a link to the official release document.
This post only scratches the surface of what is possible. Ideas and issues with regards to composition, sound selection and arrangement has not even been touched upon. There are a few tools that can assist musical creators, the first main stream option probably being Band in a Box by PG Music.
Another interesting platform is Cognitone by Synfire which is certainly leaning in the right direction.
Another interesting alternative is Liquid Notes by Recompose, which integrate with Ableton’s Live 9 and higher.
Also check out Magenta a project by Google, using the TensorFlow platform. It will certainly be interesting to see how the bigger companies take this up over the coming years. Steinberg’s Cubase already offers its users a Chord Assistant which can make suggestions based on a few chords within the project. Just like now might say, you had to edit drums on tape, like that, by measuring, splicing and joining. So, people will say in 5 or 10 years’ time, you had to phase align your kick by hand, create parallel processing chains and sit for hours clicking around before coming up with anything, or you kept all your mixing secrets to yourself making sure that no one would get your sound. With the plethora of educational services available, Mix with the Masters, Pure Mix, Mix Academy and plenty more, featuring engineers and producers ranging from seasoned pros all the way to living legends showing the willing payer how they crafted there mixes. In 5 to 10 years, even just local engineers will also know all the tricks in the book. Yet who gets the gig, the one who can do the job better. But this advance will surely always have its naysayers. But mark my words, you will still fork out a lot of money in the future for workflow improvements created by machine learning!
Please feel free to continue this discussion below, and let’s see with the type of ideas the readers can come up with, and where DAW technology can improve.
Note, this was authored January 2017.