Can VHD Files Be Used as a Vector DB?

I’ve been exploring the idea of using VHD files as a vector database. VHD files are known for their extensibility – for instance, a VHD can be created with an initial size limit of, say, 200 GB and then be expanded indefinitely according to the available space on the hard drive. This feature is well known, but I wonder whether it provides a useful structure for storing and processing vectors.

Some questions that come to mind:

  • Performance and Access:
    Will using a VHD file, with its extendable nature, affect the speed of querying high-dimensional vector data? Can the file system hierarchy manage the performance demands typical of vector databases?
  • Scalability and Extensibility:
    Since VHD files can grow based on the underlying HDD’s capacity, does this offer a practical and scalable alternative for handling large volumes of vectors? How would the dynamic expansion of a VHD translate into efficient vector data storage and retrieval?
  • Structure and Organization:
    In this approach, if the VHD’s drives, folders, and files act as the branches and leaves of a data tree, will that mapping prove effective for representing vector data? Can an /index folder using SQL to store contextual information add value to this structure?
  • Integration and Portability:
    VHD files provide a portable and user-friendly setup that anyone could inspect without the need for extra layers like a Linux container. Does this simplicity benefit vector data applications in a way that traditional vector databases might not?
  • Why Might This Not Be Implemented?
    Given these potential strengths, why hasn’t this design been adopted yet? Are there hidden performance limitations or synchronization issues that make the approach less practical, or is it simply an unexplored idea waiting for further investigation?

I’m curious to hear thoughts and experiences on whether using VHD files in this way could offer a reliable, extensible, and portable structure for vector databases.

Looking forward to your insights and feedback on this concept!

1 Like

This is for Linux, but it might be a similar concept?