Question about the causality of Roberta TOKENS

i want to use the roberta model in the following way:
given a list of N tokens, i want the model to compute a hidden_state for each of the N tokens in a causal way, meaning the first token hidden_state is computed based only on the first token, the second hidden_state is computed based on the first two tokens, the third hidden_state is computed based on the first three tokens and so on.
additionally, i want THE CLS token that his hidden_state will be computed based on all the input tokens.
there seems like there is no flag or input which will make this. or is there?