Create a GitHub issues tagger

@lewtun How do you think the raw text from the issue body should be processed before being fed into the model (training\ inference)? I mean the steps preceding the basic pipeline like tokenization and encoding? I am asking, because it might have some markdown syntax as well as long sequences like exception traceback and stuff like that.