Apple has just announced the TensorFlow-Metal package for GPU/NPU accelerating on Mac devices. Therefore, I am wondering that if it is feasible to solve NLP tasks with HuggingFace transformers through TensorFlow-macOS and TensorFlow-Metal.
To figure it out, I installed TensorFlow-macOS, TensorFlow-Metal, and HuggingFace on my local device. Then, I ran the testing code to check everything installed correctly, and here was what I got.
It seems everything works fine. But, I get the following error while I attempt to fine-tune a BERT model.
InvalidArgumentError: Cannot assign a device for operation tf_bert_for_sequence_classification/bert/embeddings/Gather: Could not satisfy explicit device specification '' because the node {{colocation_node tf_bert_for_sequence_classification/bert/embeddings/Gather}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
RealDiv: GPU CPU
Sqrt: GPU CPU
UnsortedSegmentSum: CPU
AssignVariableOp: GPU CPU
AssignSubVariableOp: GPU CPU
ReadVariableOp: GPU CPU
StridedSlice: GPU CPU
NoOp: GPU CPU
Mul: GPU CPU
Shape: GPU CPU
_Arg: GPU CPU
ResourceScatterAdd: GPU CPU
Unique: CPU
AddV2: GPU CPU
ResourceGather: GPU CPU
Const: GPU CPU
So, I checked that if TensorFlow detected GPU correctly, and here is what I had.
tf.test.is_gpu_available()
WARNING:tensorflow:From <ipython-input-2-17bb7203622b>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
WARNING:tensorflow:From <ipython-input-2-17bb7203622b>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-06-29 01:56:25.862829: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-06-29 01:56:25.862893: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Out[2]: True
It looks like that HuggingFace is unable to detect the proper device. Is there any way to solve this issue, or would be solved in near future?
I appreciate and looking forward to your kind assistance.
It doesn’t look like whatever device TF recongizes is actually usable - perhaps that may be the reason why HF can’t leverage it? if you can’t allocate tensors to GPU then there is little scope of executing ops there.
unfortunately, its not really a conclusive test. I admit I haven’t used an M1 device, but as long as the framework can use a device, I don’t see why Huggingface won’t be able to.
Could you try doing a speed comparison for training models to ensure it’s not running on CPU?
Another thing - in your image
Metal Device set to: AMD Radeo Pro 5500M
Am I correct in assuming that you want to use an AMD GPU, not via the M1 processor?
ahh, so you are using tensorflow-metal to accelerate the training on AMD GPUs.
Well, I am sure its not very different than the standard tensorflow we use, unfortunately, it doesn’t seem to be open source.
I am not from Huggingface, so I can’t clarify but having a fork not being open source makes it quite difficult to work it; hence the lack of integration of HF+TF_metal. It doesn’t make sense for them to invest in a framework used by so few. You would have to do customizations yourself and figure out how to do that - which I think is a pretty daunting task.
Someone else might be able to explain better I suppose; I don’t know much about the Mac ecosystem unfortunately
I have made another BERT model with TensorFlow-Hub only, and I got the same error as before.
InvalidArgumentError: Cannot assign a device for operation AdamWeightDecay/AdamWeightDecay/update/Unique: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
RealDiv: GPU CPU
ResourceGather: GPU CPU
AddV2: GPU CPU
Sqrt: GPU CPU
Unique: CPU
ResourceScatterAdd: GPU CPU
UnsortedSegmentSum: CPU
AssignVariableOp: GPU CPU
AssignSubVariableOp: GPU CPU
ReadVariableOp: GPU CPU
NoOp: GPU CPU
Mul: GPU CPU
Shape: GPU CPU
Identity: GPU CPU
StridedSlice: GPU CPU
_Arg: GPU CPU
Const: GPU CPU
So, this issue should from TensorFlow-Hub, not HuggingFace. I will report this issue to Apple Developer Forum. Anyway, thank you all good fellows.