While there appear to be some projects that are similar in certain respects, overall, there doesn’t seem to be any existing open-source project that is exactly the same.
This is a strong idea. The best version is not “use a language model to generate legal words.” It is a hybrid puzzle system where a symbolic engine enforces the rules, an embedding model ranks legal moves by semantic quality, and a smaller instruct model turns those moves into hints, clues, and interaction. That pattern matches both the current tooling and the research: Hugging Face’s constrained generation features are built for lexical constraints, Sentence Transformers recommends a retrieve-then-rerank pipeline for harder semantic selection, and work like NeuroLogic and PICARD exists precisely because plain generation does not reliably obey fine-grained constraints on its own. (Hugging Face)
Why this idea is better than a normal word-game clone
Letter Boxed is interesting because it combines formal legality with human strategy. A move is not just “a valid word.” It is a word that preserves future options, covers missing letters, avoids dead ends, and ideally feels satisfying. That is exactly where NLP can help: not as the legality checker, but as the layer that decides which legal moves are interesting, hint-worthy, or theme-consistent. Existing public solvers for Letter Boxed-style problems mostly rely on dictionary filtering and search, while semantic games like Contexto and Semantle rely on embeddings and vector search. Your concept is promising because it can combine both instead of choosing one. (GitHub)
Has anyone built similar things
Yes, but usually only one slice of the full idea.
1. Rule-first word puzzle solvers
There are public Letter Boxed solvers that build legal word sets and then search chains over them. They prove that the symbolic side is already tractable and fast. That means you do not need a transformer to decide whether a move is legal. (GitHub)
2. Embedding-first semantic games
Contexto and Semantle-style solvers use embeddings and cosine similarity to guide guesses, explore neighborhoods, and exploit semantic structure. Those projects are close to your “semantic variation” idea, especially if you replace “guess the hidden word” with “pick the next move that stays in-theme.” (GitHub)
3. Puzzle generation research
There is older but still useful work on automated word puzzle generation using topic dictionaries and semantic similarity. It is relevant because it treats puzzle creation as a generation-and-filtering problem with controllable difficulty, which is closer to your goal than a simple daily solver. (arXiv)
The architecture I would use
1. Keep legality fully symbolic
This layer should know:
- which letters or syllables are available,
- which transitions are legal,
- which words are in the lexicon,
- which constraints remain unsatisfied,
- and whether a partial chain can still reach a good ending.
Use a trie or DAWG plus a graph search. For a Letter Boxed-style game, that is the cleanest and safest way to guarantee correctness. Hugging Face does support lexical constraints in generation, but even the docs and surrounding ecosystem make it clear that constrained generation is a specialized mechanism, not a substitute for a direct rules engine when you already know the legal state space. (Hugging Face)
A minimal legality pass can look like this:
def legal_next_words(last_char, board_state, lexicon_by_start):
for w in lexicon_by_start[last_char]:
if obeys_board_rules(w, board_state):
yield w
That should be your first move generator.
2. Use embeddings to rank legal moves, not invent them
Once you have a legal candidate set, embeddings become valuable. Sentence Transformers explicitly recommends a retrieve-and-rerank pattern: a fast bi-encoder retrieves candidates, then a CrossEncoder reranks a much smaller set more accurately. In your game, the “documents” are just legal candidate words or clue texts. (Sbert)
So the pipeline becomes:
- symbolic engine returns legal candidates
- embedding model scores semantic closeness or thematic fit
- reranker chooses the best few
- hint model decides how much to reveal
That is much stronger than “cosine similarity above X” by itself, because raw cosine thresholds are brittle and often produce candidates that are technically related but not fun. The reranker is where you can inject game taste. (Sbert)
A simple scoring sketch:
def rank_move(candidate, prev_word, theme, uncovered_letters, branch_count,
sim_score, rerank_score):
coverage = len(set(candidate) & uncovered_letters)
dead_end_penalty = 5 if branch_count == 0 else 0
return (
1.8 * coverage +
1.2 * sim_score +
1.5 * rerank_score +
0.4 * branch_count -
dead_end_penalty
)
3. Use an instruct model for interaction, not for validity
The instruct model should do things like:
- phrase a subtle hint,
- explain why a move is strong,
- give a clue without revealing the answer,
- adjust tone and difficulty,
- or turn internal scores into readable feedback.
That is the right place for a small chat model. It should not be the source of truth for what is legal. Research like NeuroLogic and PICARD is useful here because it shows that when formal constraints matter, inference-time control is often necessary even for strong models. (arXiv)
Concrete model choices
If you want current Hugging Face options for a first serious prototype, I would split them by role.
Semantic retrieval and ranking
A sensible modern pairing is Qwen/Qwen3-Embedding-0.6B plus Qwen/Qwen3-Reranker-0.6B. Their model cards describe them as a matched text embedding and reranking family, with multilingual support and 32k context. That is a good fit for semantic variants and clue retrieval. (Hugging Face)
Hint-writing model
Qwen/Qwen3-4B-Instruct-2507 is a reasonable text-only assistant layer. Its card describes it as a 4B instruct model with native 262,144-token context, which is more than enough for game state, logs, and hint policies. (Hugging Face)
If you want lighter local experiments
google/embeddinggemma-300m is attractive for the semantic layer because its card positions it as a compact open embedding model for deployment on smaller devices while still supporting many languages.