HF Dataset + TensorFlow + Ragged Tensors (Object Detection)

I think this doc is just a bit confusing, in particular it mixes “formatting as TF” and “converting to TF” which is not the same thing

  • format as TF in datasets: calling with_format("tf") doesn’t load in RAM, it only sets the output type of the Dataset to TF tensors (but the data still lives on disk and is memory mapped)
  • convert to TF in tf.data: by loading the full data in memory using e.g. tf.data.Dataset.from_tensor_slices()

Would be great to rephrase it a bit to make it clearer though, the docs can be modified here: datasets/docs/source/use_with_tensorflow.mdx at main · huggingface/datasets · GitHub

1 Like