I’m working with an LLM that generates files in unified diff format. However, in some cases, the LLM generates invalid output due to spaces between tokens.
For example
--- EventsLiteTestKit.scala 2024-05-31 07:00:00
+++ EventsLiteTestKit.scala 2024-05-31 07:00:01
@@ -39,10 +39,10 @@
import java.util.UUID
import scala.collection.concurrent.TrieMap
- import scala.concurrent.Future
+ import scala.concurrent.{Future, _}
import scala.concurrent.Future.{failed, successful}
class EventsLiteTestKit {
This is not a valid patch file because of the space between the ‘-’ character and the word ‘import’.
Any ideas on how to force the model to encode such that there are no spaces after the ‘-’ and ‘+’ symbols?