Could you do something about licenses of datasets and models? At least for popular ones.
E.g. many popular datasets and models based on llama or alpaca have wrong license (too open), because these datasets are non commercial, so derived works also.
- tatsu-lab/alpaca · Datasets at Hugging Face
- yahma/alpaca-cleaned · Datasets at Hugging Face
- Model: declare-lab/flan-alpaca-large