So am fairly new at NLP, so im just getting used to HF and NLP in general. My project is at img-caption in spanish. Thus following questions,
IT looks like wit dataset isnt on HF library, so downloaded it on the TPU. But files are on tsv and the HF parsers I found support csv, py dicts, json etcs. What should be the fastest way here to parse correctly? any HF library for TSVs?
Valhalla my proj lead suggested using run_summarization_flax.py script, but I am a bit confused. How would I go about specifying ViT as encoder; and how would I go about adding my cross-attn layer to BERT and specifying it to that script ? I see only one
tokenizer_namewhich I am not sure for my use case.
I just want to start with pre-trained networsk and once it works, go with fine tuning etc. Thanks