Upload custom Llama2 model with injected linear layers

I have created a custom Llama2 model by replacing all linear layers with a custom linear layer, as follows:

def replace_quantlinear_layers(model):
    # Collect the names and modules to be replaced
    layers_to_replace = {}
    for name, module in model.named_modules():
        if isinstance(module, QuantLinear):
            layers_to_replace[name] = module

    # Replace the layers
    for name, module in layers_to_replace.items():
        # Create a new instance of the custom quantized layer
        new_linear = CustomLinear(module.bits, module.group_size, module.infeatures, module.outfeatures, module.bias is not None)

        # Transfer weights (and biases) from the original layer
        new_linear.qweight.data = module.qweight.data.clone().to("cuda")
        new_linear.qzeros.data = module.qzeros.data.clone().to("cuda")
        new_linear.scales.data = module.scales.data.clone().to("cuda")
        new_linear.wf.data = module.wf.data.clone().to("cuda")
        if module.bias is not None:
            new_linear.bias.data = module.bias.data.clone().to("cuda")

        # Find the parent module and replace the original layer with the new one
        if '.' in name:
            parent_name, child_name = name.rsplit('.', 1)
            parent_module = dict(model.named_modules())[parent_name]
            setattr(parent_module, child_name, new_linear)
            # For top-level modules
            setattr(model, name, new_linear)

The original model is TheBloke/Llama-2-7B-Chat-GPTQ
The final model is defined as:

  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (rotary_emb): LlamaRotaryEmbedding()
          (k_proj): CustomLinear()
          (o_proj): CustomLinear()
          (q_proj): CustomLinear()
          (v_proj): CustomLinear()
        (mlp): LlamaMLP(
          (act_fn): SiLU()
          (down_proj): CustomLinear()
          (gate_proj): CustomLinear()
          (up_proj): CustomLinear()
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
    (norm): LlamaRMSNorm()
  (lm_head): CustomLinear(in_features=4096, out_features=32000, bias=False)

All other layers are the same. The QuantLinear and Linear layers in the GPTQ Llama2 are replaced with my custom linear layers.

What is the easiest and quickest way to upload this model to huggingface so that I can quickly load the weights and run it. Do I have to write a complete model definition in PyTorch and a config?