I know it sounds complicated to set up, but I just followed the instructions exactly as they provided.
They don't really have good instructions. I know this is open source, which is great, but often it means the documentation is non-professional.
However... eventually I DID get some results, and they are surprisingly good!
Their example file worked, so I thought there might be some format requirements, which I didn't notice from the documentation. Comparing the example file with the few mp3 I have and I was using to test the program, I noticed that all my mp3 had variable bit rate, while the example mp3 had fixed bitrate.
So I tried a fixed-rate mp3 and it worked! Well, partially... first time it crashed in the middle of processing (but at least it started), perhaps because of lack of RAM. I know I have an old PC with only 4Gb of RAM, and I know that some intensive processing must be required for this task... but really how is it possible that to process a ~4 MEGA byte file the program uses 4 GIGA byte of memory? It actually does, because it my second try, after closing every possible other program in order to free more RAM, I monitored spleeter and it went up to more than 3.5 Gb of usage at some point.
This time it didn't crash, but it got stuck at some point, however it did manage to get as far as producing some of the supposed audio files: vocals, drums and piano (I was trying the 5-stem processing version). The third one is empty, but it could be that it got stuck after creating the file but before filling it with data. Vocals and drums are VERY good. I don't know if the models used by the program are for a specific music genre, but I tried it with a pop-synth song that I have no idea if it's supposed to be a good or bad choice for spleeter ("Adamski - Killer"). If you're familiar with the song, the drums are synthesized not real acoustic drums, but they came up pretty well. The voice is also very well isolated, not perfectly but essentially you would only hear a few "beeps" around the vocals, not processing artifacts but small sounds which are in the original audio, which apparently confused the program a bit, but they are not particularly annoying.
The processing is slow (more than half an hour but it's hard to tell when it got stuck exactly) because I am using the CPU option and, as I said, I have an old PC. There is an option to use the GPU which is claimed to take a time shorter than the song duration itself, but requires to have an Nvidia graphic card specifically, I have a GeForce 550 Ti but it's unbranded so I am not sure it will work.