Jump to content
C3 Forums

Spleeter - software to generate "stems" from final mix


Recommended Posts

This came across my timeline today, and I immediately thought it would be useful for this community.

 

https://github.com/deezer/spleeter

 

This is a tool that claims to be able to split sources into "stems" via similar techniques to some of Harmonix's post-RB4 DLC output, by isolating the frequencies of each part. The difference is that it attempts to do so automatically using machine learning, with a dataset already having been created.

 

This may be useful in producing better-sounding customs. Hopefully it will help some of you still out there authoring!

Link to post
Share on other sites

This came across my timeline today, and I immediately thought it would be useful for this community.

 

https://github.com/deezer/spleeter

 

This is a tool that claims to be able to split sources into "stems" via similar techniques to some of Harmonix's post-RB4 DLC output, by isolating the frequencies of each part. The difference is that it attempts to do so automatically using machine learning, with a dataset already having been created.

 

This may be useful in producing better-sounding customs. Hopefully it will help some of you still out there authoring!

 

Hmmm... Linux Based?! - this is real challenge too hunt down the bugs, very thanks for this :smug: :smug: :smug: :rock: :rock: :rock:

Link to post
Share on other sites

I tried this and I'm just blown away by how easily it outputs great results! Drums and vocals have turned out really well! Bass too. Basically free karaoke versions of songs with this tool for the most part. And for the bass extraction, that should help for charting since it's usually the hardest to hear and comes out fairly well too.

Link to post
Share on other sites

I know it sounds complicated to set up, but I just followed the instructions exactly as they provided. Download the right version for whatever version of Python you have installed. Download Conda. Use git to get everything. A more common error I've seen is the input path in the arguments to run it being incorrect. You can drag and drop a file into the console window to get the path in there if that's faster.

Link to post
Share on other sites

I know it sounds complicated to set up, but I just followed the instructions exactly as they provided.

 

They don't really have good instructions. I know this is open source, which is great, but often it means the documentation is non-professional.

 

However... eventually I DID get some results, and they are surprisingly good!

 

Their example file worked, so I thought there might be some format requirements, which I didn't notice from the documentation. Comparing the example file with the few mp3 I have and I was using to test the program, I noticed that all my mp3 had variable bit rate, while the example mp3 had fixed bitrate.

 

So I tried a fixed-rate mp3 and it worked! Well, partially... first time it crashed in the middle of processing (but at least it started), perhaps because of lack of RAM. I know I have an old PC with only 4Gb of RAM, and I know that some intensive processing must be required for this task... but really how is it possible that to process a ~4 MEGA byte file the program uses 4 GIGA byte of memory? It actually does, because it my second try, after closing every possible other program in order to free more RAM, I monitored spleeter and it went up to more than 3.5 Gb of usage at some point.

 

This time it didn't crash, but it got stuck at some point, however it did manage to get as far as producing some of the supposed audio files: vocals, drums and piano (I was trying the 5-stem processing version). The third one is empty, but it could be that it got stuck after creating the file but before filling it with data. Vocals and drums are VERY good. I don't know if the models used by the program are for a specific music genre, but I tried it with a pop-synth song that I have no idea if it's supposed to be a good or bad choice for spleeter ("Adamski - Killer"). If you're familiar with the song, the drums are synthesized not real acoustic drums, but they came up pretty well. The voice is also very well isolated, not perfectly but essentially you would only hear a few "beeps" around the vocals, not processing artifacts but small sounds which are in the original audio, which apparently confused the program a bit, but they are not particularly annoying.

 

The processing is slow (more than half an hour but it's hard to tell when it got stuck exactly) because I am using the CPU option and, as I said, I have an old PC. There is an option to use the GPU which is claimed to take a time shorter than the song duration itself, but requires to have an Nvidia graphic card specifically, I have a GeForce 550 Ti but it's unbranded so I am not sure it will work.

Link to post
Share on other sites

More tests and more failures... with variable bitrate mp3 it just never works for me, and with fixed bitrate mp3 it always gets stuck between generating stem files. Sometimes no stem files are generated at all, some other times I get 3 or 4 (voice, drums, other and I may get piano or not), while it never generates the bass stem. There is a slim chance it's still processing but I let it go for over 2 hours and it didn't show a sign of life...

 

One interesting thing is why does it generate the "other" stem (basically the background track) before other stems. I would have expected that the algorithm identifies and extracts stems of known instruments first, and then saves whatever remains into other.wav.

 

I focused on testing the 5:stems option but probably I should use the 4:stems option instead, because now I understand that the "piano" stem in 5:stems mode is really looking for piano sounds in the audio, possibly including electric piano but not keyboards in general, so if the song does not include piano specifically, the 4:stems mode should be used instead.

 

And yes, unfortunately there is no standard way to extract a guitar stem, apparently due to guitar sounds being too variable in music. But the whole software is based on machine learning, and with a proper dataset (meaning, if you have plenty of guitar stems from other songs, which I don't) it should be possible to train the algorithm to recognize the guitar as well. Perhaps this would need to be done multiple times for different guitar sounds (i.e. with various combinations of the usual guitar effects such as distortion, reverb, chorus, flanger, etc...), but I would bet that if Spleeter manages to attract enough interest, then people will start to train/create and share new modes for extracting other instruments.

 

For RB3 custom authors, there is great potential here... I have my own issues with running the algorithm but no reason to believe the same happens to others, so if you don't count those issues, extracting the stems literally takes one single command line, and you don't need to mess with any options other than picking the model you want - to recap, the current models are 2:stems (vocals + rest), 4:stems (vocals + drums + bass + rest) and 5:stems (vocals + drums + bass + piano + rest).

Link to post
Share on other sites

This will time consuming but this is the great tool to split up and may be isolation of track, this job well get done

 

I can't get the GPU mode working, which supposedly speeds up the whole processing. The CPU mode still crashes often, generally takes about half an hour to produce results, and in most cases it doesn't output all the stems, so I have to repeat the stems generation a few times per song. Overall I expect a couple of hours to process one song, but it is really "time consuming" only for my PC, not for me because all I have to do is write a command line and wait for the stems... so despite all the flaws I can accept the situation :)

 

Now the real question is whether the stems produced are usable for RB3 customs. When you listen to the separate stems, they sound pretty bad, so I wonder whether players missing notes in-game for a few seconds will make the audio also sound bad. But if you listen to all stems simultaneously, they definitely add up to the original (good) audio.

Link to post
Share on other sites

I didn't try that program, but I recently installed tensorflow GPU for another project and I had a bit of trouble to make the GPU work too, I just didn't take the required versions as seriously as I needed. When they say 10.0 for CUDA for example I assumed that 10.1 would work just fine, but I was wrong, 10.0 was explicitely needed. When they say that Python 3 would work just fine, I thought that the new Python 3.8 would, but it didn't, I had to use Python 3.7. I don't know if I was the one that can't read, or if they weren't precise enough on the TensorFlow website, but yeah, I hope this is your issue. That's for TF2.0 by the way, so if that software uses TF 1.x, they will most likely require other versions? idk, but triple check that.

Link to post
Share on other sites
  • 2 weeks later...

I didn't try that program, but I recently installed tensorflow GPU for another project and I had a bit of trouble to make the GPU work too, I just didn't take the required versions as seriously as I needed. When they say 10.0 for CUDA for example I assumed that 10.1 would work just fine, but I was wrong, 10.0 was explicitely needed. When they say that Python 3 would work just fine, I thought that the new Python 3.8 would, but it didn't, I had to use Python 3.7. I don't know if I was the one that can't read, or if they weren't precise enough on the TensorFlow website, but yeah, I hope this is your issue. That's for TF2.0 by the way, so if that software uses TF 1.x, they will most likely require other versions? idk, but triple check that.

 

I know it's my fault that I've been rushing into testing spleeter without paying much attention to requirements and setup, but at this time of the year for various reasons my time for hobby is near-zero... but at the same time the spleeter documentation could really be more helpful than it is.

 

In my case, when reading the wiki, I checked my NVIDIA control panel and it seemed to me CUDA drivers were already installed. On CUDA website there seemed to be no drivers to download anyway but rather a "toolkit" for developers. The spleeter wiki didn't make me think I would need to actually write any code, just to have drivers so that the spleeter processing would be redirected to the GPU instead of the CPU, and make everything much faster. But whenever I tried to activate the GPU mode in Conda, spleeter always crashed.

 

Yesterday I tried to install the CUDA toolkit and see if it made any difference (mainly because I seemed to understand that it also updates the drivers themselves). I made sure to download 10.0 and not the newer ones. That's a 1.5 GIGAbyte stuff to download and install... and I am not sure it made a difference. Now GPU mode doesn't crash* but it still takes half an hour to complete.

 

*actually in one case my PC got completely stuck and I had to power off in the hard way, then it took half an hour just to recover

 

Then I noticed from here https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html#windows that CUDA installation instructions also tell me to install Visual Studio and build this "nbuild" project with it. But I don't get the point, is this just for testing out CUDA or is this actually required when using CUDA from another program (i.e. spleeter)? I do not expect to be required to do any coding for using spleeter...

Link to post
Share on other sites

You shouldn't need to build anything for CUDA to work fine with TensorFlow. I'm pretty sure you can start python and try to import tensorflow and see if it loads your GPU as expected. If it does, then the issue may be with spleeter? idk at this point to be honest. Assuming you followed the TensorFlow installation instructions for the version you are trying to run (which I believe is 1.5), it should work. The only thing I'd watch is your Python version, it's not mentionned anywhere on the TensorFlow website but Python 3.8 doesn't work yet with it so gotta install Python 3.7 (unless that was fixed in the last week, which is totally possible).

Link to post
Share on other sites

The only thing I'd watch is your Python version, it's not mentionned anywhere on the TensorFlow website but Python 3.8 doesn't work yet with it so gotta install Python 3.7 (unless that was fixed in the last week, which is totally possible).

 

Thanks! I definitely need to double-check that... I had Python for a while and don't remember what version might be.

Link to post
Share on other sites
  • 3 weeks later...

I am ready to use spleeter-generated tracks to create some custom :)

 

At the moment, spleeter generates 5 separate tracks:

 

- vocals (including harmonies)

- drums

- bass

- piano

- other (everything else to sum up to the original audio)

 

If the song doesn't have real piano, it makes sense to run spleeter in 4-tracks mode, but it doesn't hurt to still use 5-tracks mode anyway and eventually merge piano+other in Reaper (you should not throw the piano track away even if it seems almost blank, otherwise the sum won't match the original audio perfectly).

 

Now the question is what to do with the "other" track in RB3. Depending on the song, "other" may contain guitar(s), keyboard(s) and additional instruments such as harmonica, winds and strings. I think there are mainly two options:

 

1) The obvious option would be in Magma to map "other" into the background track, and use an empty file for both guitar and keys in-game. This means that the usual "cancellation on a miss/error" feature of RB3 will work for drums and bass, but not guitar and keys.

 

2) The alternative (especially useful when "other" contains only guitar + keyboard) is to drop the volume of the "other" track in Reaper to half, then use it twice in Magma, once for guitar and once for keyboard. This means that missing notes or making mistakes on guitar/keys will lower their volume but not completely cancel them. However I am not sure if this might sound horrible when using the whammy bar on guitar or bending strip on keys?

 

Also, generally speaking, the individual tracks sound fairly good but not perfect. This means that, whenever ONE player makes mistakes, the remaining tracks still sound quite good together; but the MORE players make mistakes, the worst the remaining tracks will sound.

 

I am not sure which of the 2 options I will choose, but I am currently thinking that because of these limitations, I'd rather provide also a single-track version.

Link to post
Share on other sites
  • 2 weeks later...

I am ready to use spleeter-generated tracks to create some custom :)

 

At the moment, spleeter generates 5 separate tracks:

 

- vocals (including harmonies)

- drums

- bass

- piano

- other (everything else to sum up to the original audio)

 

If the song doesn't have real piano, it makes sense to run spleeter in 4-tracks mode, but it doesn't hurt to still use 5-tracks mode anyway and eventually merge piano+other in Reaper (you should not throw the piano track away even if it seems almost blank, otherwise the sum won't match the original audio perfectly).

 

Now the question is what to do with the "other" track in RB3. Depending on the song, "other" may contain guitar(s), keyboard(s) and additional instruments such as harmonica, winds and strings. I think there are mainly two options:

 

1) The obvious option would be in Magma to map "other" into the background track, and use an empty file for both guitar and keys in-game. This means that the usual "cancellation on a miss/error" feature of RB3 will work for drums and bass, but not guitar and keys.

 

2) The alternative (especially useful when "other" contains only guitar + keyboard) is to drop the volume of the "other" track in Reaper to half, then use it twice in Magma, once for guitar and once for keyboard. This means that missing notes or making mistakes on guitar/keys will lower their volume but not completely cancel them. However I am not sure if this might sound horrible when using the whammy bar on guitar or bending strip on keys?

 

Also, generally speaking, the individual tracks sound fairly good but not perfect. This means that, whenever ONE player makes mistakes, the remaining tracks still sound quite good together; but the MORE players make mistakes, the worst the remaining tracks will sound.

 

I am not sure which of the 2 options I will choose, but I am currently thinking that because of these limitations, I'd rather provide also a single-track version.

 

whoa! - i've been encounter that... that was i was thinking with that, Harmonix could do this from the music, hmmm... i got the idea...

 

Thanks!

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...