What is the difference between VAD and Speaker Segmentation?

I’m not sure I can distinguish and understand the difference between:

  • VAD (Voice Activity Detection) and
  • Speaker Segmentation

I understand that:

  • VAD - split audio to segments of speech or not speech
  • Speaker Segmentation - split audio to segments of not speech and different speakers

for example:

VAD                  = [not speech, speech,  not speech,         speech,      not speech] 

Speaker Segmentation = [not speech, speech , not speech,  speech A, speech B, not speech] 

Am I right ?

Is my example correct ?