Does higher work with huggingface (hugging face, HF) models? e.g. ViT?

Current error with ViT:

Traceback (most recent call last):
  File "/lfs/ampere3/0/brando9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_maml_torchmeta.py", line 509, in <module>
    main()
  File "/lfs/ampere3/0/brando9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_maml_torchmeta.py", line 443, in main
    train(rank=-1, args=args)
  File "/lfs/ampere3/0/brando9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_maml_torchmeta.py", line 485, in train
    meta_train_fixed_iterations(args, args.agent, args.dataloaders, args.opt, args.scheduler)
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/training/meta_training.py", line 104, in meta_train_fixed_iterations
    log_zeroth_step(args, meta_learner)
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/logging_uu/wandb_logging/supervised_learning.py", line 170, in log_zeroth_step
    train_loss, train_acc = model(batch, training=training)
  File "/lfs/ampere3/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/meta_learners/maml_meta_learner.py", line 66, in forward
    meta_loss, meta_loss_ci, meta_acc, meta_acc_ci = meta_learner_forward_adapt_batch_of_tasks(self, spt_x, spt_y,
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/meta_learners/maml_differentiable_optimizer.py", line 473, in meta_learner_forward_adapt_batch_of_tasks
    meta_losses, meta_accs = get_lists_losses_accs_meta_learner_forward_adapt_batch_of_tasks(meta_learner,
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/meta_learners/maml_differentiable_optimizer.py", line 511, in get_lists_losses_accs_meta_learner_forward_adapt_batch_of_tasks
    fmodel: FuncModel = get_maml_adapted_model_with_higher_one_task(meta_learner.base_model,
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/meta_learners/maml_differentiable_optimizer.py", line 195, in get_maml_adapted_model_with_higher_one_task
    diffopt.step(inner_loss, grad_callback=lambda grads: [g.detach() for g in grads])
  File "/lfs/ampere3/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/higher/optim.py", line 237, in step
    all_grads = grad_callback(all_grads)
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/meta_learners/maml_differentiable_optimizer.py", line 195, in <lambda>
    diffopt.step(inner_loss, grad_callback=lambda grads: [g.detach() for g in grads])
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/meta_learners/maml_differentiable_optimizer.py", line 195, in <listcomp>
    diffopt.step(inner_loss, grad_callback=lambda grads: [g.detach() for g in grads])
AttributeError: 'NoneType' object has no attribute 'detach'

ref: