Interest in preprocessing utilities for multifile model uploads

Pitch
Huggingface follows a ‘single file’ paradigm that can catch developers off guard. All relevant classes must be located in a single file, called huggingface.py. While some support has been included for recursive transfer, it is not thorough, with bugs on loading locally vs remotely. Huggingface has developed around that assumption, and it cannot easily be bypassed.

This significantly decreases the quality of coding for huggingface products; one has to locate everything in one file, making development messy. In theory, staging by inlining relative dependencies into one file can solve this issue, but the existing builtin solutions do not handle recursive directory traversal or comment preservation well. I have built a solution that inlines to a single huggingface.py file instead. Is the community interested?

Details

In theory, relative imports can be inlined. So long as a project is structured such that it’s dependency tree is a Directed Acyclic Graph one can replace the import statement with the module being imported as a linking step. Unless a namespace collision happens, code then just runs successfully.

How the Inliner Works

Naive line-by-line import detection is defeated by multiline parenthesized
imports, inline comments, docstrings containing the word “import”, and
TYPE_CHECKING blocks. The inliner therefore operates through a sentinel
pipeline:

  1. TYPE_CHECKING blocks are converted to comments so their imports are inert.
  2. Docstrings and comments are extracted and replaced with COMMENT_N
    sentinels, preventing their content from being misread as live imports.
  3. Top-level import blocks — including parenthesized multiline forms — are
    extracted and replaced with IMPORT_N sentinels. Inline comments are
    promoted above the sentinel.
  4. Each import block is standardized into canonical single-line forms. The
    six supported forms are:
    import module
    import module as alias
    from module import name
    from module import name as alias
    from .relative import name
    from .relative import name as alias
    Multi-name, parenthesized, and semicolon-separated imports are all expanded
    into these forms. Star imports raise ValueError.
  5. Each IMPORT_N sentinel is resolved: relative imports are replaced with
    the recursively inlined content of the target file; external imports are
    emitted on first encounter and commented out on recurrence.
  6. COMMENT_N sentinels are restored, returning docstrings and comments to
    the merged source verbatim.

Right now, it throws on ‘inline’ imports that are indented, since those are not scoped globally. It also does not handle importing modules, though I do know how to add support for that; it is not worth it unless commonly demanded. One thing it has for my purposes that I particularly needed was it preserves comments and docstrings while inlining, which was a constriction I was operating under. It also expects everything to have module docstrings, so I would have to make that more robust before widespread release. I want opinions before going through the extra hassle.

Questions:

  • Would you like to be able to develop huggingface compatible models that span multiple files and subfolders, and would tolerate flattening before staging into hub?
  • How important do you find the idea of preserving comments when uploading?
  • If you find this beneficial, would you prefer a utility in huggingface proper, or a standalone package?
  • Is there any interest in fixing a few of the parsing bugs in huggingface dynamic utils, or is that code in a “don’t touch” state?
  • How important is it that your code can import a module rather than a class from it? namespace support is possible, but ugly.

I am personally not deeply familiar with this area, but this seems closely tied to the long-running Transformers discussion around the single-file philosophy:


You may already be aware of this context, so apologies if this is redundant, but I think this proposal might be easier to evaluate if it is connected to the broader history of the Transformers single-model-file policy and the newer Modular Transformers compromise.

My understanding is roughly this:

  • Transformers has historically preferred a single model file style, where the code needed to understand a model’s forward pass lives in one modeling_*.py file.
  • This is intentionally not very DRY. The tradeoff is that model behavior is easier to inspect, review, copy, debug, and modify locally.
  • More recently, Modular Transformers seems to introduce a compromise: contributors can write a more modular source file with imports/inheritance, and a converter/linter generates standalone modeling_*.py, configuration_*.py, etc. files from it.
  • Your proposal feels like a similar pattern, but aimed at Hub custom code / multifile model uploads rather than models contributed directly to the Transformers repository.

Some relevant background links:

So perhaps the proposal could be framed not only as an isolated “inliner”, but as something like a Hub-side analogue of Modular Transformers:

Transformers repo:
  modular_<model>.py
    -> generated modeling_<model>.py / configuration_<model>.py / processing_<model>.py

Hub custom code:
  src/*.py or model/*.py
    -> generated flattened upload artifact

That framing might make the idea stronger, because it does not necessarily oppose the single-file philosophy. It could instead be seen as preserving the same final-user property:

authors can work with modular source code, while users/reviewers/loaders get a flattened, inspectable artifact.

In other words, the important question may not be “is flattening compatible with the single-file philosophy?” — it probably is, at least in spirit. The harder questions seem to be around source of truth, reproducibility, semantic equivalence, and integration with the existing dynamic module loader.

A few design questions that seem important to me:

  1. What is the source of truth?
    Is the multifile source tree the canonical code, with the flattened file treated as generated output? Or is the flattened file itself supposed to be edited/reviewed as the canonical Hub artifact?

  2. When is the flattened file generated?
    Should it be generated locally before upload, by save_pretrained, by push_to_hub, by a Hub-side build step, or by a standalone CLI?

  3. Should the generated file be committed to the Hub repo?
    Committing it makes the actually executed code inspectable. But it also creates the risk that the generated file drifts from the source tree unless there is a check.

  4. Should there be a CI / pre-publish check?
    For example, something like: regenerate the flattened artifact, compare it with the committed one, and fail if they differ. This seems analogous to how generated files are checked in the Modular Transformers workflow.

  5. How do we verify semantic equivalence?
    Python imports are not just textual inclusion. try/except import, optional dependencies, TYPE_CHECKING, module-level side effects, circular imports, __all__, lazy imports, and dynamically imported modules can all be tricky.

  6. How should this interact with dynamic_module_utils.py?
    Transformers already has logic for discovering/copying relative imports in custom code. For example, get_relative_import_files recursively follows relative imports. A pre-flattening approach might reduce reliance on that mechanism, but it also overlaps with the same responsibility area.

  7. How should generated files be made reviewable?
    It may be useful for the flattened artifact to include clear source-file boundary markers, for example:

# ---------------------------------------------------------------------
# BEGIN inlined file: src/modeling/attention.py
# ---------------------------------------------------------------------

...

# ---------------------------------------------------------------------
# END inlined file: src/modeling/attention.py
# ---------------------------------------------------------------------
  1. Should generated files include a “do not edit manually” header?
    Something like:
# This file is generated from the multifile source tree.
# Do not edit this file manually; edit the source files and regenerate.
  1. Should the original multifile source be uploaded too?
    Uploading both the source tree and the flattened artifact may help reviewability, but it also raises the question of which one loaders should use.

  2. Should this be an external tool, a Transformers utility, or part of the Hub upload flow?
    An external tool is easier to experiment with. A Transformers utility might be easier to standardize. A Hub/upload integration would provide the best UX, but probably requires the clearest contract.

I also think there is a useful distinction between two related but different problems:

Problem A:
  How should models inside the Transformers repo be authored and maintained?

Problem B:
  How should arbitrary custom model code on the Hub be packaged, loaded, cached, inspected, and trusted?

Modular Transformers seems primarily aimed at Problem A.

Your proposal seems primarily aimed at Problem B.

That distinction makes the proposal more interesting, not less. It means this may not be a duplicate of Modular Transformers. It may be the same underlying compromise applied to a different layer of the ecosystem.

There are also some existing pain points around multifile custom code and relative imports that seem relevant:

These examples make me think the hard part is not merely concatenating files. It is defining a small, predictable packaging contract for custom model code.

One possible contract could be something like:

1. The multifile source tree is the source of truth.
2. The flattened artifact is generated output.
3. The generated artifact is committed/uploaded for inspectability.
4. The generated artifact contains source-boundary comments.
5. The generated artifact contains a do-not-edit header.
6. A check verifies that the generated artifact is up to date.
7. A load test verifies that AutoModel.from_pretrained(..., trust_remote_code=True) works locally.
8. The tool explicitly documents unsupported Python patterns.

That kind of contract might make the tool easier to reason about, because it would avoid silently becoming a general-purpose Python bundler.

For example, it could explicitly support a conservative subset:

Supported:
  - acyclic relative imports
  - same-repository Python source files
  - normal class/function/constant definitions
  - straightforward external imports
  - comments/docstrings preservation
  - source-file boundary markers

Possibly unsupported or warning-only:
  - circular imports
  - wildcard imports
  - importlib-based dynamic imports
  - module-level side effects that depend on import order
  - optional imports inside complex conditionals
  - runtime mutation of module globals

This would also make the safety/review story clearer. A flattened file is not automatically safer than multifile code, especially with trust_remote_code=True, but it can make the review target more explicit if the generated file is deterministic and inspectable.

So my tentative reaction is:

  • The idea seems useful.
  • It seems philosophically compatible with the single-file policy if treated as “modular authoring → flattened artifact”.
  • It seems especially relevant for Hub custom code, where relative imports and dynamic module loading can be fragile.
  • The main challenge is not the basic inlining idea, but the contract around reproducibility, reviewability, semantic equivalence, and source-of-truth.
  • It might be worth explicitly positioning this as a Hub/custom-code-side counterpart to Modular Transformers, rather than only as a standalone preprocessing script.

This may also help decide where it should live. If the goal is experimentation, an external package seems natural. If the goal is standardizing custom-code packaging for the Hub, then it probably needs alignment with Transformers’ existing dynamic module loading and/or Hub upload APIs. If the goal is to eventually become official, it may be useful to first define the exact supported subset and failure modes.

In short, I like the direction, but I think the strongest framing is:

not “let’s replace the single-file philosophy,”
but “let’s give custom Hub model authors the same kind of modular-authoring / standalone-artifact compromise that Transformers itself is moving toward.”

Fantastic feedback John, and I really appreciate you taking the time to write that.

Thinking over this, I would say the motivating factors are

  • There are some serious deficiencies in dynamic_utils I know of, and likely elsewhere, due to the single file pattern. The one I know of in specific is if you import multiple files which in turn import other files and you save a pretrained model then load it, it fails locally if loaded locally but not if loaded remotely. Some things are really broken in that package. Incidentally, if anyone wants I could probably fix it.
  • In the meantime, it would be really useful to have a mechanismthat can attempt to fix the problem I faced: You have a model you cannot, in fact, upload to huggingface for some reason. Having an inliner flag in the save_pretrained mechanism would be a fairly ideal case I think. It’s purpose is to let repositories with multiple recurrent files be uploaded if possible. If it cannot be, it would clearly state why and how to fix it. That is the responsibility in a nutcshell

Lets assume we provide an additional preprocessor for this. I would insert in in transformers at transformers.model_utils in PreTrainedModel using an additional flag on save_pretrained; this flag is also passed along to push_to_hub by any needed modifications in transformers.utils.hub. In the meantime, we might consider fixing those bugs in dynamic utils (I could build a more robust import resolution system if anyone is interested) but an explicit opt-in system seems like the right choice for now given I do not know everything I would be breaking if I rebuild the resolution system; I would really want to talk to the maintainers first. The longer term fix would likely be to make dynamic_utils recursively part multilevel imports (that is from .model.attention.cache works now too, not just from .model; and perhaps represent the files in a standardize flat format on the hub. All files are individually retained, but their path in the original repo is parsable by file name. this is not, currently, globally supported, unfortunately. But again, before I start doing surgury I really want to know the patient better. And this is an excellent fix for now.

Regarding your design question

  1. The compiled output is human readable by design. Unlike a pickle dump, a parser moves the files together with all relevant comments retained. I was really annoyed with the hub as I could never read anything on the hub, so this preprocessor deliberately retains comments, and since I refuse to write bad code that is what I ended up in.
  2. I would say in the save_pretrained step; push to hub uses saved_pretrain as one of it’s actions.
  3. This utility fills a niche: I have a nicely organized repo with multiple folders and many files, and want to push it to the hub. The existing system cannot even support that. So I would think of it as a push converter.
  4. I am not sure I understand the mental model where they can differ. One is an entire diverse project with many interconnected imports. The other is a flat file. I could generate it multiple times. But how exactly would automatic CI testing work? The main problem is the same as tracing; I don’t know what inputs to start from. I could make a CI test and unit test for the code itself.
  5. There are two levels here. One is the design question: is this inlining transform always going to be semantically equivalent? You can formally verify this by using the idea of a Directed Acyclic Graph. If a dependency tree of imports can be built that is acyclic, then the system can be confirmed to be inlinable without conflicts. In practice, this means keeping track of observed inlined files, and if files are trying to inline each other they are cyclic and not supported. This would also not have been supported in normal python and causes a recursive import error.
  6. That logic is broken. That is why I made this in the first place. If people want me to fix it, I know where most of the breaks are.
    7-8: It already does that. It is very readable.

Regarding the problem, it is not even really aimed at the hub per say, but at uploading to the hub and covering some inconsistencies between huggingface now and as it originated. The problem it solves is "I have a really complicated project that deserves lots of files and folders, but huggingface has hardwired assumptions that it turns out I need.

Oh. My previous comment about Modular Transformers came partly from a somewhat fuzzy memory, and I’m not a Transformers maintainer, so please take this with the appropriate amount of salt. But based on things I had happened to see before, plus what I looked into further using your reply as a starting point, I think one thing has become fairly clear: regardless of the exact implementation strategy,** there does seem to be real demand for solving this broader custom-code packaging problem**:


After looking into this a bit more, I would answer the original “is there demand?” question with a fairly strong yes, but with one important nuance:

The recurring demand seems to be less specifically for “an inliner” as such, and more for complete, reproducible, inspectable custom-code artifacts for trust_remote_code=True models.

An inliner may be one good strategy for that broader problem, especially if the goal is to let authors develop in a normal multifile layout while still publishing something closer to the traditional Transformers single-file / standalone-artifact style.

But I would probably avoid presenting inline as the only correct implementation. It may be more maintainable to frame it as one possible custom-code packaging strategy.

Direct reaction to your points

My direct reaction to your reply is roughly this:

Your point My current read
dynamic_utils / dynamic module loading has serious deficiencies around multifile or recursive imports This seems plausible. The surrounding issue history suggests that custom-code saving/loading has been fragile for a long time. I would still separate “current loader/save bug with an MRE” from “new opt-in packaging feature”.
An opt-in save_pretrained flag would be useful I agree that this seems like a reasonable user-facing shape. I would maybe phrase it as a possible custom_code_packaging strategy rather than committing too early to one exact flag name.
push_to_hub should receive the same behavior That also seems reasonable if the packaging transform is part of the saved artifact contract. Ideally, push_to_hub would upload the same complete custom-code artifact that save_pretrained creates.
Rebuilding the dynamic import resolver directly may be risky I agree. An opt-in packaging transform may be safer than changing dynamic-module resolution globally. It avoids making the loader responsible for arbitrary Python project layouts.
The long-term fix might still involve better multi-level import handling Possibly, but I would treat that as a separate track. One track is “fix current loader/save behavior with concrete MREs”; another is “provide an explicit packaging transform”.
The generated output is human-readable and preserves comments/docstrings That is one of the strongest arguments for the approach. Reviewability matters a lot for trust_remote_code=True, because the artifact is executable code.
CI testing is unclear I meant something narrower than proving all possible semantic equivalence: regenerate the artifact, compare it with the committed/generated output, then run local/remote/offline load smoke tests.
A DAG gives a clean boundary for inlining I agree that acyclicity is a very useful support boundary. I would only be cautious about treating DAG-ness alone as a full semantic-equivalence proof in Python.
This is about uploading/saving for the Hub, not necessarily changing the Hub itself That distinction makes sense. I would frame the problem as producing a complete custom-code artifact for save_pretrained / push_to_hub, rather than asking the Hub or dynamic loader to support arbitrary Python package layouts.

So my current read is:

Yes, the need is real.

But the strongest framing may be:
  custom-code artifact completeness for trust_remote_code=True models

rather than:
  a multifile inliner, specifically

Why I think the demand is real

There seems to be a long-running pattern of related issues and fixes around this area:

Year Issue / PR What it suggests
2022 #15224: Copy of the custom modeling file when saving a model Users already needed save_pretrained() to copy custom modeling files for dynamic-code models.
2022 #20884: Santacoder saved checkpoints missing required .py files Fine-tuned checkpoints for trust_remote_code=True models could be unusable because required code files were missing.
2023 #21008: Make sure dynamic objects can be saved and reloaded Core Transformers already added fixes so dynamic/custom objects could be saved and reloaded with their code.
2023 #24737: Falcon models saved with save_pretrained no longer get saved with Python files / #24785 Another concrete regression/fix around copying custom Python files during saving.
2023 #27688: Remote code improvements Broader concerns around trust_remote_code, auto_map, downstream libraries, and documentation.
2024 #29714: push_to_hub for a trust_remote_code=True model Users wanted push_to_hub() to push all files needed by a custom model, not only weights/config/tokenizer files.
2024 #32923 / #33100 Local-vs-remote custom code behavior affected pipeline registration and AutoClass behavior.
2024 #34855: Offline mode does not work with models requiring trust_remote_code=True save_pretrained() artifacts were not always self-contained enough for offline / fresh-machine loading.
2024 sentence-transformers #2613 Downstream users also need hermetic/offline Docker-style deployments for models requiring remote code.
2025 #36808: Support loading custom code objects in offline mode from local Ongoing work around fully saving/loading trust_remote_code=True custom objects in offline/local settings.
2025 #37716: Fix custom code saving A major merged PR explicitly aimed at making save_pretrained() and push_to_hub() correctly save relevant custom modeling files.
2025 #37751: Stop autoconverting custom code checkpoints Custom-code checkpoints may need special handling in adjacent infrastructure.
2026 #45684: save_pretrained custom model files copied with readonly permissions Saved custom-code files are touched by post-save tooling, so generated/copied artifacts are a real workflow.
2026 #45698: from_pretrained loads wrong custom module after save_pretrained Custom module identity/cache/local-source behavior can still be subtle after saving.

So, to me, this looks like a real problem family. It has appeared as:

  • missing custom .py files after save_pretrained();
  • missing custom .py files after push_to_hub();
  • local-vs-remote custom-code inconsistencies;
  • offline/hermetic deployment failures;
  • auto_map / _auto_class fragility;
  • pipeline registration differences;
  • dynamic-module cache/module identity issues;
  • relative-import limitations and under-documentation.

That is a fairly strong signal that there is real demand.

How I would frame the core problem

I would probably frame the problem less as:

How do we support arbitrary multifile Python projects on the Hub?

and more as:

How do we produce a complete, reproducible, inspectable custom-code artifact
for `trust_remote_code=True` models, across `save_pretrained()`,
`push_to_hub()`, local loading, remote loading, and offline loading?

That framing seems to connect better with the existing Transformers work.

It also avoids forcing the loader to become a general Python package resolver. Instead, the save/push step could produce an artifact that the dynamic loader already knows how to consume.

Why your inliner idea still seems relevant

The current custom-code machinery already appears to be somewhat artifact-oriented.

The relevant area seems to be dynamic_module_utils.py, especially functions such as:

  • custom_object_save
  • get_relative_imports
  • get_relative_import_files
  • get_cached_module_file
  • get_class_in_module

From the current code, custom_object_save() looks like it already saves custom object source files and discovered relative imports into the target folder. It also appears to copy files by basename, which makes the current save path feel closer to a flat artifact than to preserving an arbitrary nested Python package layout.

So I think your proposal can be framed as a natural extension of an existing direction:

Current-ish direction:
  collect custom code files
  copy them into the save/push artifact

Possible inline strategy:
  collect custom code files
  generate one deterministic, inspectable file
  update metadata so AutoClass loading points to that generated file

That does not necessarily fight the single-file philosophy. It may actually align with it:

Authoring:
  modular source tree

Published artifact:
  generated standalone/flat/inspectable custom-code artifact

This is similar in spirit to the broader compromise behind Modular Transformers, though the target layer is different:

Area Source authoring Published / consumed artifact
Transformers repo models modular_<model>.py with imports/inheritance generated standalone modeling_*.py, configuration_*.py, etc.
Hub custom code proposal multifile custom source tree generated flat or inline artifact for trust_remote_code=True loading

I would still be cautious about saying it is “the same thing” as Modular Transformers. It is not. But the design pattern is similar: modular authoring, standalone artifact.

I would present inline as one packaging strategy, not the whole proposal

One useful way to make the implementation discussion less binary may be to define a small strategy space:

Strategy Output artifact Advantages Risks / open questions
current Whatever current save_pretrained() / push_to_hub() produces Maximum backward compatibility Existing edge cases remain.
flat_copy Copy discovered .py files into the save directory Close to current custom_object_save() behavior Basename collisions, lost package structure, relative import quirks.
preserve_package Preserve nested package directories Most Pythonic for authors More work for dynamic module loading/cache; may conflict with current same-directory assumptions.
inline Generate one standalone .py file Inspectable, single-file-compatible, loader-simple Semantic equivalence, deterministic generation, source-of-truth questions.
external CLI Pre-publish generated artifact Easy to experiment with outside core Transformers Not standardized; users must wire it into their own publishing flow.

Then your proposal becomes:

Add or experiment with an `inline` custom-code packaging strategy.

rather than:

Replace the current custom-code loader with an inliner.

That seems easier to evaluate.

Possible API shape, very tentatively

I do not know where maintainers would want this to live, so I would treat this as illustrative rather than prescriptive.

Maybe something like:

model.save_pretrained(
    save_directory,
    custom_code_packaging="inline",
)

and eventually:

model.push_to_hub(
    repo_id,
    custom_code_packaging="inline",
)

or perhaps a lower-level utility first:

from transformers.utils import package_custom_code

package_custom_code(
    entry_file="modeling_my_model.py",
    output_file="modeling_my_model_generated.py",
    strategy="inline",
)

I am not saying these are the right API names. The important part is the contract:

Given a custom-code entrypoint and a supported subset of relative imports,
produce a deterministic artifact that can be saved, pushed, inspected,
cached, and loaded.

Possible responsibility boundary

I would be careful here. From the outside, it is tempting to say:

Just add a flag to `save_pretrained()`.

But the recent custom-code saving work appears to touch more than one function. For example, #37716 touched custom-code saving, _auto_map, AutoClass behavior, multiple save/load paths, tests, and docs/docstrings.

So I would phrase the implementation boundary cautiously:

`dynamic_module_utils.custom_object_save()` looks like one plausible hook,
because it already saves custom object source files and updates config-side
metadata for Hub loading.

But I would not claim it is definitely the correct hook. The right abstraction
may need to account for AutoClass behavior, `auto_map`, local-vs-remote loading,
processors/tokenizers/configs, and push-to-hub behavior.

That keeps the proposal helpful without over-prescribing internals.

What I meant by CI / checks

When I mentioned CI, I did not mean:

Prove all possible model behavior is equivalent for all inputs.

I meant a much narrower generated-artifact consistency check:

1. Run the packager/inliner.
2. Compare the generated file with the checked-in generated file.
3. Fail if they differ.
4. Run AutoModel.from_pretrained(<local_saved_dir>, trust_remote_code=True).
5. If practical, also test a Hub-like or remote load path.
6. Optionally compare a tiny forward pass or at least state_dict keys
   between the source and packaged forms.

So the CI input would not need to be arbitrary user inputs. It could start from a tiny toy custom model fixture.

For example:

toy_model/
  configuration_toy.py
  modeling_toy.py
  backbone.py
  modules.py

with:

# modeling_toy.py
from .backbone import ToyBackbone

and:

# backbone.py
from .modules import ToyModule

Then the check could be:

model.save_pretrained(tmpdir)
AutoModel.from_pretrained(tmpdir, trust_remote_code=True)

plus, for the packaging tool specifically:

generate artifact
compare generated artifact with expected artifact
load from generated artifact

That is much narrower than full semantic verification, but still useful.

About DAGs and semantic equivalence

I agree that acyclicity is probably a very good support boundary. If the relative-import graph is cyclic, the packager can clearly reject it.

I would only be cautious about saying that DAG-ness alone proves semantic equivalence in Python.

A DAG means a topological inline order can exist. But Python import behavior can also depend on:

  • module identity;
  • import order;
  • sys.modules;
  • __name__;
  • __package__;
  • __file__;
  • __all__;
  • module-level side effects;
  • optional imports;
  • try/except import;
  • TYPE_CHECKING;
  • wildcard imports;
  • duplicate names after flattening;
  • monkey-patching;
  • importlib;
  • local-vs-remote cache behavior.

So I would phrase it as:

Acyclic import graph:
  necessary / practical condition for supported inlining

Full semantic equivalence:
  still worth checking with load tests and possibly a tiny forward pass

This does not make the inliner idea weaker. It just makes the support contract more precise.

Why inline might be attractive

An inline artifact could have several practical advantages:

Advantage Why it matters
Fewer dynamic relative imports The loader has less dependency graph to reconstruct.
More inspectable artifact Reviewers/users can inspect one generated file.
Closer to single-file philosophy The final artifact resembles the traditional Transformers model file style.
Better offline/hermetic behavior The saved directory can contain executable custom code without needing to fetch remote code again.
Easier upload completeness push_to_hub() has fewer files to miss.
Potentially simpler cache invalidation One deterministic file may be easier to hash than a graph of relative imports.

But these advantages depend on the generated file being deterministic and honest about its origin.

For example, I would expect generated files to include something like:

# This file was automatically generated from a multifile custom-code source tree.
# Do not edit this file manually; edit the source files and regenerate.
# Source root: <source_root>
# Entry point: <entry_file>
# Packaging strategy: inline

and source boundary markers such as:

# ---------------------------------------------------------------------
# BEGIN inlined file: layers/attention.py
# ---------------------------------------------------------------------

...

# ---------------------------------------------------------------------
# END inlined file: layers/attention.py
# ---------------------------------------------------------------------

That would make the artifact more reviewable.

Possible initial supported subset

Something like this may be easier to maintain:

Supported:
  - one custom-code entry file
  - same-repository relative imports
  - acyclic dependency graph
  - normal `from .foo import Bar` imports
  - normal class/function/constant definitions
  - external imports preserved at the top
  - comments/docstrings preserved
  - deterministic output
  - generated source boundary markers
  - clear error messages for unsupported patterns

Unsupported at first:
  - circular imports
  - wildcard relative imports
  - dynamic imports via `importlib`
  - imports outside the source root
  - namespace packages
  - complex module-level side effects
  - ambiguous duplicate symbols
  - package layouts that require runtime package identity

I would not present this as the final design, only as a possible starting point.

Possible tests / MREs

If this becomes a GitHub issue or PR, I think the most useful thing would be to split examples into small reproducible cases.

1. Save artifact completeness

Goal:
  `save_pretrained()` should produce a directory that can be loaded
  without manually copying custom `.py` files.

Minimal layout:

toy_model/
  config.json
  configuration_toy.py
  modeling_toy.py
  helper.py

Import chain:

# modeling_toy.py
from .helper import ToyBlock

Test:

model.save_pretrained(tmpdir)
AutoModel.from_pretrained(tmpdir, trust_remote_code=True)

2. Recursive relative imports

Goal:
  transitive relative imports are either supported, clearly rejected,
  or transformed into a generated artifact.

Minimal layout:

toy_model/
  configuration_toy.py
  modeling_toy.py
  backbone.py
  modules.py

Import chain:

# modeling_toy.py
from .backbone import ToyBackbone
# backbone.py
from .modules import ToyModule

This is close to the kind of issue described in #36653.

3. Nested package layout

Goal:
  decide whether nested subpackages are unsupported, preserved,
  flat-copied, or inlined.

Minimal layout:

toy_model/
  configuration_toy.py
  modeling_toy.py
  layers/
    __init__.py
    attention.py
    rope.py

Import chain:

# modeling_toy.py
from .layers.attention import ToyAttention
# layers/attention.py
from .rope import apply_rope

This would clarify whether the desired behavior is:

preserve package layout

or:

generate a flat/inline artifact

4. Push artifact completeness

Goal:
  `push_to_hub()` should push the same complete custom-code artifact
  that `save_pretrained()` would produce locally.

This is close to #29714, where the issue was that a custom model needed additional files to function properly after push.

5. Offline/hermetic loading

Goal:
  A saved model directory should be usable on a fresh machine in offline mode
  if all required custom code was saved.

This connects to:

6. Module identity / cache behavior

Goal:
  A saved model should not accidentally load a different local custom module
  with the same filename/class name.

This connects to #45698.

Possible issue split

If this is taken to GitHub, I would probably avoid one giant issue.

Maybe split it like this:

Issue type Possible title Purpose
Bug / MRE Recursive relative imports are not reliably included for trust_remote_code custom models Show current behavior with a minimal failing repo.
Feature request Add an opt-in custom-code packaging strategy for save_pretrained / push_to_hub Discuss inline, flat_copy, preserve_package, etc.
Docs clarification Clarify supported relative-import layouts for Hub custom code Explain same-directory imports, nested packages, generated artifacts, and reload tests.
Experimental package External custom-code inliner / packager Prove the idea before proposing core integration.

That separation may make the discussion easier for maintainers to act on.

Possible venue

I am less certain about the best venue, so I would treat this only as practical guidance, not official routing.

My understanding is:

Place Probably good for
This Forum thread Initial context, demand check, design sketch.
transformers-community/support Discussions Cross-linking a broader Transformers design/API question. It appears to be used for some semi-official community discussions, but I would not call it guaranteed/canonical.
GitHub Issue Focused bug report or feature request with MRE/API sketch.
GitHub PR Tests, docs, or implementation once the target behavior is clear.

The transformers-community/support Space seems relevant because there are already broader discussions there, such as:

But I would not rely on that as the only path. For concrete bugs and feature requests, GitHub issues are probably still the most actionable place.

My tentative summary

I would summarize the situation like this:

There is real demand, but I would name the demand carefully.

The demand is for complete, reproducible, inspectable custom-code artifacts
for `trust_remote_code=True` models.

Inlining is one possible packaging strategy.

It may be especially attractive because it aligns with the single-file /
standalone-artifact style, reduces relative-import complexity, and can make
the saved/pushed artifact easier to inspect.

But it should probably be presented as an opt-in strategy, not as the only
right design.

The exact implementation hook should be left open for maintainers, though
`dynamic_module_utils.custom_object_save()` looks like a plausible place to
start reading because it already handles saving custom code files and metadata.

So I think your idea is useful, but I would pitch it less as:

Here is a preprocessing script for multifile uploads.

and more as:

Here is a possible opt-in packaging strategy for the broader custom-code
artifact completeness problem that Transformers has already been working on
for several years.

That framing seems both stronger and safer.