Not a language model - remove word embedding weights from model for lighter cuda util

i’m working on a model using the bert architecture but it’s not a language model.

i’m doing so by using the arguement ‘inputs_embeds’ and also outputting the pooled output layer.
so what’s important is that i don’t have any use of the word embedding layers.
i was wondering if there’s any clean way of removing those word embedding layers and other related layers that are not used by me, perhaps it can make my model lighter and reduce the times i’m getting cuda out of memory errors.