Skip to content

Commit

Permalink
readme + max_tick recalculation during preprocess
Browse files Browse the repository at this point in the history
  • Loading branch information
Natooz committed Sep 24, 2021
1 parent 5826734 commit fa74c7c
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,9 @@ NOTES:

### Create your own

You can easily create your own encoding strategy and benefit from the MidiTok framework. Just create a class inheriting from the [MIDITokenizer](miditok/midi_tokenizer_base.py#L34) base class, and override the ```track_to_tokens```, ```tokens_to_track``` and ```_create_vocabulary``` methods with your tokenization strategy.
You can easily create your own encoding strategy and benefit from the MidiTok framework. Just create a class inheriting from the [MIDITokenizer](miditok/midi_tokenizer_base.py) base class, and override the ```track_to_tokens```, ```tokens_to_track``` and ```_create_vocabulary``` methods with your tokenization strategy.

We encourage you to read the docstring of the [Vocabulary class](miditok/vocabulary.py) to learn how to use it for your strategy.

## Features

Expand All @@ -191,6 +193,9 @@ These tokens bring additional information about the structure and content of MID
* **Rests:** include "Rest" events whenever a segment of time is silent, i.e. no note is played within. This token type is decoded as a "Time-Shift" event, meaning the time will be shifted according to its value. You can choose the minimum and maximum rests values to represent (default is 1/2 beat to 8 beats). Note that rests shorter than one beat are only divisible by the first beat resolution, e.g. a rest of 5/8th of a beat will be a succession of ```Rest_0.4``` and ```Rest_0.1```, where the first number indicate the rest duration in beats and the second in samples / positions.
* **Tempos:** specify the current tempo. This allows to train a model to predict tempo changes alongside with the notes, unless specified in the chart below. Tempo values are quantized on your specified range and number (default is 32 tempos from 40 to 250).

Additionally, MidiTok offers to include *Program* tokens in the vocabulary of MIDI-Like, REMI and CP Word.
We do not consider them additional tokens though as they are not used anywhere in MidiTok, but intended for you to insert them at the beginning of each sequence as *Start Of Sequence* tokens.

| | MIDI-Like | REMI | Compound Word | Structured | Octuple | MuMIDI |
|-------|:-------------:|:--------------:|:--------------:|:----------:|:--------:|:-------------:|
| Chord |||||||
Expand Down
1 change: 1 addition & 0 deletions miditok/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@
from .octuple_mono import OctupleMonoEncoding
from .midi_tokenizer_base import MIDITokenizer, get_midi_programs, detect_chords, merge_tracks, \
merge_same_program_tracks
from .vocabulary import Vocabulary, Event
from .constants import MIDI_INSTRUMENTS, INSTRUMENT_CLASSES, INSTRUMENT_CLASSES_RANGES, CHORD_MAPS, DRUM_SETS,\
CONTROL_CHANGES
4 changes: 4 additions & 0 deletions miditok/midi_tokenizer_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,10 @@ def preprocess_midi(self, midi: MidiFile):
continue
t += 1

# Recalculate max_tick is this could have change after notes quantization
if len(midi.instruments) > 0:
midi.max_tick = max([max([note.end for note in track.notes]) for track in midi.instruments])

if self.additional_tokens['Tempo']:
self.quantize_tempos(midi.tempo_changes, midi.ticks_per_beat)
# quantize_time_signatures(midi.time_signature_changes, midi.ticks_per_beat)
Expand Down

0 comments on commit fa74c7c

Please sign in to comment.