Word based text generators make use of tokenizations, they scan a text, build a word-vector table.
I wonder how this is done. ea does it learn each verb, walk walking walked walks, etc as different words. Or is there an indexer, stored with a base verb, with only a lookup table for irregular verbs?
In essence, To Walk Conjugation - All English Verb Forms
does already contains short base sentences, that are part of everyday chat. " I’ve been walking."
( maybe just stored as some number- that could become walk 24 walk or Sandra 27 walk.
I was thinking could such a scheme optimize (reduce footprint) a smaller amount of total word vectors.
by making the tokenizer / de-tokenizer smart (supply it with most language rules, firt word capitalize add . at the end etc)