Handle non-MP3 sections of streams more gracefully.

_(I came here from processing/processing-sound#32 - I hope I'm in the right fork...)_

Most real-world MP3 files containing actual music will have ID3 tags and/or other garbage before the actual MP3 frames.

The current implementation in https://github.com/kevinstadler/JavaMP3/blob/master/src/main/java/fr/delthas/javamp3/Decoder.java#L441 assumes that the first occurrence of the sync word (i.e. 12 set bits, beginning on a byte boundary) marks the first MP3 header of the file. In practice this appears to be wrong more often than not.

The consequence of this is that you might find a `0b111111111111` pattern somewhere in the ID3 tags followed by bits that do not form a valid MP3 header. You might read `samplingFrequency` as `0b11` and then try to access `SAMPLING_FREQUENCY` out of bounds - similar for the other table indices. From the tens of files I tried, I estimate that 80% of my music library fails to be decoded by JavaMP3 for this reason.

In [my experience](https://github.com/MLanghof/MP3ROR/blob/7166261c594148a2a920b38dff3f073d0b3077c6/MP3ROR/Header.pde) so far, this is relatively simple to mitigate (at least conceptually) by staying suspicious whether we _actually_ have an MP3 frame:

1. When encountering an invalid index into any of the tables while decoding a supposed header, it's not a valid header - keep looking. This requires at most 4 bytes of "lookahead".

2. While searching the _first_ header, if you find a 4 byte sequence that looks like a valid header (i.e. passes 1.), additionally require that the frame described by it is followed immediately by another valid header. For all subsequent headers you should be able to skip this check as there should be no garbage between the frames.

With specifically crafted ID3 tags this might still be breakable but I have not found any problems with this approach on normal files.

I understand that this is less trivial to implement when you are reading from a `Buffer` and can't easily look forward/backward. But even if you have to work around that, at least the performance/latency impact should be negligible because 2. is only relevant for finding the first header and 1. has an overhead of a single `int`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Handle non-MP3 sections of streams more gracefully. #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Handle non-MP3 sections of streams more gracefully. #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions