HF Dataset + TensorFlow + Ragged Tensors (Object Detection)

lunde · October 16, 2024, 2:05pm

Hi I’m trying to use Datasets to load my data into a TF-DS to use with Keras 3. This seems impossible because of the Ragged Tensors on BBox / ClassLabels? I’m getting the following error:

RuntimeError: Unrecognized array dtype object. 
Nested types and image/audio types are not supported yet.

This is my collate_fn :

    def collate_fn(examples):
        print(examples)
        images, boxes, classes = [], [], []
        for example in examples:
            images.append(tf.convert_to_tensor(example["image"], dtype=tf.float32))
            boxes.append(
                tf.reshape(tf.convert_to_tensor(example["boxes"], dtype=tf.float32), (len(example["boxes"]), 4))
            )
            classes.append(
                tf.reshape(
                    tf.convert_to_tensor(example["classes"], dtype=tf.float32),
                    (
                        len(
                            example["classes"],
                        )
                    ),
                )
            )

        return {"image": tf.stack(images), "boxes": tf.ragged.stack(boxes), "classes": tf.ragged.stack(classes)}

Thanks in advance!

John6666 · October 16, 2024, 2:13pm

It seems that the library version problem is the most common cause. It seems that neither too new nor too old is good.

github.com/huggingface/transformers

Failed to import transformers.pipelines because of the following error (look up to see its traceback): cannot import name 'PartialState' from 'accelerate'

opened 04:30PM - 12 May 23 UTC

closed 08:07AM - 25 Oct 23 UTC

Abhranta

### System Info I am trying to import Segment Anything Model (SAM) using transf…ormers pipeline. But this gives the following error : " RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback): cannot import name 'PartialState' from 'accelerate' (/opt/conda/lib/python3.10/site-packages/accelerate/__init__.py)" What i am trying to do : " from transformers import pipeline generator = pipeline("mask-generation", model="facebook/sam-vit-huge", device=0) " ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction import this line: from transformers import pipeline generator = pipeline("mask-generation", model="facebook/sam-vit-huge", device=0) ### Expected behavior The model should import as per this notebook in official tutorials: https://github.com/huggingface/notebooks/blob/main/examples/automatic_mask_generation.ipynb

lunde · October 17, 2024, 7:31am

Thanks! I’ll try this when Im back at work.

But I’d like to note I’m using Keras 3 with TensorFlow, not Transformers.

John6666 · October 17, 2024, 11:58am

With tf.keras, it looks like there’s been some changes.

github.com/tensorflow/tensorflow

tf.keras.Model with nested dictionary inputs fails to serialize/deserialize

opened 02:50PM - 21 Oct 23 UTC

closed 01:48AM - 18 Nov 23 UTC

burnpanck

stat:awaiting response type:bug stale comp:keras TF2.14

Info: - Issue type: **Bug** - Have you reproduced the bug with TensorFlow Nigh…tly? No - Source: binary - TensorFlow version: 2.14.0 - Custom code: No - OS platform and distribution: macOS 13.6 - Python version: 3.11 - CUDA/cuDNN version: none ### Current behavior? When trying to serialize/deserialize a `tf.keras.Model` nested input shapes cause an error. Note that this has been observed many years back in #37061, which was closed because their MVP included a `tf.keras.Sequential` which was deemed as not supported. However, the issue has nothing to do with `tf.keras.Sequential` at all, and instead lies purely in the deserialisation code of keras. ### Standalone code to reproduce the issue ```shell import tensorflow as tf @tf.keras.saving.register_keras_serializable(package="MyPackage") class DummyModel(tf.keras.Model): def __init__(self, name=None): super().__init__(name=name) self.sublayer = tf.keras.layers.Dense(16) def call(self, x, **kw): a = x["a"] nested = x["nested"] b = nested["b"] c = nested["c"] return self.sublayer(tf.concat([a,b,c], axis=-1)) model = DummyModel() out = model(dict( a = tf.keras.Input(3,dtype=tf.float32), nested = dict( b = tf.keras.Input(4,dtype=tf.float32), c = tf.keras.Input(5,dtype=tf.float32), ), )) model.summary() model.save("temp.keras") tf.keras.saving.load_model("temp.keras") ``` ### Relevant log output ```shell --------------------------------------------------------------------------- TypeError Traceback (most recent call last) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/framework/tensor_shape.py:851, in TensorShape.__init__(self, dims) 850 try: --> 851 self._dims.append(as_dimension(d).value) 852 except TypeError as e: File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/framework/tensor_shape.py:741, in as_dimension(value) 740 else: --> 741 return Dimension(value) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/framework/tensor_shape.py:217, in Dimension.__init__(self, value) 216 except AttributeError: --> 217 raise TypeError( 218 "Dimension value must be integer or None or have " 219 "an __index__ method, got value '{0!r}' with type '{1!r}'".format( 220 value, type(value))) from None 221 if self._value < 0: TypeError: Dimension value must be integer or None or have an __index__ method, got value ''b'' with type '<class 'str'>' The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:204, in make_shape(v, arg_name) 203 try: --> 204 shape = tensor_shape.as_shape(v) 205 except TypeError as e: File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/framework/tensor_shape.py:1526, in as_shape(shape) 1525 else: -> 1526 return TensorShape(shape) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/framework/tensor_shape.py:853, in TensorShape.__init__(self, dims) 852 except TypeError as e: --> 853 raise TypeError( 854 "Failed to convert '{0!r}' to a shape: '{1!r}'" 855 "could not be converted to a dimension. A shape should " 856 "either be single dimension (e.g. 10), or an iterable of " 857 "dimensions (e.g. [1, 10, None]).".format(dims, d)) from e 858 self._dims = tuple(self._dims) TypeError: Failed to convert '{'b': [None, 4], 'c': [None, 5]}' to a shape: ''b''could not be converted to a dimension. A shape should either be single dimension (e.g. 10), or an iterable of dimensions (e.g. [1, 10, None]). During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[6], line 1 ----> 1 tf.keras.saving.load_model("temp.keras") File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/saving/saving_api.py:254, in load_model(filepath, custom_objects, compile, safe_mode, **kwargs) 249 if kwargs: 250 raise ValueError( 251 "The following argument(s) are not supported " 252 f"with the native Keras format: {list(kwargs.keys())}" 253 ) --> 254 return saving_lib.load_model( 255 filepath, 256 custom_objects=custom_objects, 257 compile=compile, 258 safe_mode=safe_mode, 259 ) 261 # Legacy case. 262 return legacy_sm_saving_lib.load_model( 263 filepath, custom_objects=custom_objects, compile=compile, **kwargs 264 ) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/saving/saving_lib.py:281, in load_model(filepath, custom_objects, compile, safe_mode) 278 asset_store.close() 280 except Exception as e: --> 281 raise e 282 else: 283 return model File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/saving/saving_lib.py:246, in load_model(filepath, custom_objects, compile, safe_mode) 244 # Construct the model from the configuration file in the archive. 245 with ObjectSharingScope(): --> 246 model = deserialize_keras_object( 247 config_dict, custom_objects, safe_mode=safe_mode 248 ) 250 all_filenames = zf.namelist() 251 if _VARS_FNAME + ".h5" in all_filenames: File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/saving/serialization_lib.py:731, in deserialize_keras_object(config, custom_objects, safe_mode, **kwargs) 729 build_config = config.get("build_config", None) 730 if build_config: --> 731 instance.build_from_config(build_config) 732 compile_config = config.get("compile_config", None) 733 if compile_config: File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/base_layer.py:2331, in Layer.build_from_config(self, config) 2329 input_shape = config["input_shape"] 2330 if input_shape is not None: -> 2331 self.build(input_shape) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py:494, in Model.build(self, input_shape) 489 x = [ 490 base_layer_utils.generate_placeholders_from_shape(shape) 491 for shape in input_shape 492 ] 493 elif isinstance(input_shape, dict): --> 494 x = { 495 k: base_layer_utils.generate_placeholders_from_shape( 496 shape 497 ) 498 for k, shape in input_shape.items() 499 } 500 else: 501 x = base_layer_utils.generate_placeholders_from_shape( 502 input_shape 503 ) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py:495, in <dictcomp>(.0) 489 x = [ 490 base_layer_utils.generate_placeholders_from_shape(shape) 491 for shape in input_shape 492 ] 493 elif isinstance(input_shape, dict): 494 x = { --> 495 k: base_layer_utils.generate_placeholders_from_shape( 496 shape 497 ) 498 for k, shape in input_shape.items() 499 } 500 else: 501 x = base_layer_utils.generate_placeholders_from_shape( 502 input_shape 503 ) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/base_layer_utils.py:189, in generate_placeholders_from_shape(shape) 188 def generate_placeholders_from_shape(shape): --> 189 return tf1.placeholder(shape=shape, dtype=backend.floatx()) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/ops/array_ops.py:3283, in placeholder(dtype, shape, name) 3279 if context.executing_eagerly(): 3280 raise RuntimeError("tf.placeholder() is not compatible with " 3281 "eager execution.") -> 3283 return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name) File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/ops/gen_array_ops.py:7071, in placeholder(dtype, shape, name) 7069 if shape is None: 7070 shape = None -> 7071 shape = _execute.make_shape(shape, "shape") 7072 _, _, _op, _outputs = _op_def_library._apply_op_helper( 7073 "Placeholder", dtype=dtype, shape=shape, name=name) 7074 _result = _outputs[:] File ~/.pyenv/versions/3.11.3/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:206, in make_shape(v, arg_name) 204 shape = tensor_shape.as_shape(v) 205 except TypeError as e: --> 206 raise TypeError("Error converting %s to a TensorShape: %s." % (arg_name, e)) 207 except ValueError as e: 208 raise ValueError("Error converting %s to a TensorShape: %s." % 209 (arg_name, e)) TypeError: Error converting shape to a TensorShape: Failed to convert '{'b': [None, 4], 'c': [None, 5]}' to a shape: ''b''could not be converted to a dimension. A shape should either be single dimension (e.g. 10), or an iterable of dimensions (e.g. [1, 10, None]).. ```

lhoestq · October 17, 2024, 3:24pm

note that you can also do ds = ds.with_format("tf") to get TF tensors / ragged tensors automatically

lunde · October 18, 2024, 5:53am

@lhoestq I know about that one, but my understanding is that it puts the tensors in-memory (documentation) which means that my computer will blow up It’s a big dataset.

@John6666 I’m not using tf.keras, I’m using keras. It’s the “new” thing where you can use TensorFlow, PyTorch or JAX. There’s currently only support for TensorFlow with the Yolo model, which is a shame - I feel tricked

John6666 · October 18, 2024, 6:37am

Oh sorry.

John6666 · October 18, 2024, 6:42am

I suggest opening a Discussion somewhere on this, because with this many users, someone will know how to solve it. When a Discussion is opened in one of the community repos, the members will be notified, so they will generally be aware of it.

If you want to solve it yourself, the error message is a very rare guy, and it should only come from this line of code, so I thought that would give you a clue.

github.com

huggingface/datasets/blob/main/src/datasets/arrow_dataset.py#L311


      
              np_dtype = np.int64
          elif np.issubdtype(np_arrays[0].dtype, np.number):
              tf_dtype = tf.float32
              np_dtype = np.float32
          elif np_arrays[0].dtype.kind == "U":  # Unicode strings
              np_dtype = np.unicode_
              tf_dtype = tf.string
          else:
              raise RuntimeError(
                  f"Unrecognized array dtype {np_arrays[0].dtype}. \n"
                  "Nested types and image/audio types are not supported yet."
              )
          shapes = [array.shape for array in np_arrays]
          static_shape = []
          for dim in range(len(shapes[0])):
              sizes = {shape[dim] for shape in shapes}
              if dim == 0:
                  static_shape.append(batch_size)
                  continue
              if len(sizes) == 1:  # This dimension looks constant
                  static_shape.append(sizes.pop())

to_tf_dataset => dataset._get_output_signature => This error
It looks like it’s only called through this route, but where is the to_tf_dataset called…?

lhoestq · October 18, 2024, 2:18pm

it doesn’t load the dataset in memory. Rather it sets the output format of the dataset to TF tensors, si when you you my_dataset[0] the output is automatically formatted as a TF tensor (and in an optimized way from the underlying Arrow data)

lunde · October 21, 2024, 8:35am

I’m using the to_tf_dataset myself, I read these docs: Using Datasets with TensorFlow

Further; I get other problems using with_format(tensorflow).

@lhoestq my dataset is not fully loaded into RAM but read as we grab batches (get_item). Are you sure about this? I’ll do some testing. According to your suggestion this documentation is false? Using Datasets with TensorFlow

lhoestq · October 21, 2024, 10:28am

I think this doc is just a bit confusing, in particular it mixes “formatting as TF” and “converting to TF” which is not the same thing

format as TF in datasets: calling with_format("tf") doesn’t load in RAM, it only sets the output type of the Dataset to TF tensors (but the data still lives on disk and is memory mapped)
convert to TF in tf.data: by loading the full data in memory using e.g. tf.data.Dataset.from_tensor_slices()

Would be great to rephrase it a bit to make it clearer though, the docs can be modified here: datasets/docs/source/use_with_tensorflow.mdx at main · huggingface/datasets · GitHub

lunde · November 1, 2024, 11:01am

Thanks @lhoestq that clarifies a lot. I’ll close this topic now

system · November 1, 2024, 11:02pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Having an issue with 'NoneType' after using to_df_dataset() function Beginners	3	3079	January 13, 2024
Error in Model.prepare_tf_dataset() 🤗Transformers	1	697	July 5, 2023
Error in model.prepare_tf_dataset Beginners	4	250	June 14, 2024
Use tf.data.Data with HuggingFace datasets 🤗Transformers	2	2638	March 22, 2021
Quick Tour: "Train using Tensorflow" gives `Dataset argument should be a datasets.Dataset` error Beginners	4	1073	May 29, 2023

Related topics