Is there a way to use any of the model to split words into word parts that some might use downstream ? For example, consider a custom domain where words like windfall, firewall exists - but user may search for “wind fall” or “fire wall” downstream. Most basic way I thought was to split them randomly into multiple parts and “accept” a split whose sub parts make sense. For example, windfall = w + indfull, wi + ndfull, win + dfull and so on…Then apply existing language model to see if the subparts words exist in vocabulary.
Appreciate if anyone has pointers.