HCAE v1.1: Bridging Local Context and Global Attention in Efficient Text Embeddings

Note: The Show and Tell # HCAE v1.1 Technical Report: Advancing Hybrid Architectures for Efficient Text Embeddings

## 1. Abstract

The HCAE (Hybrid Convolutional-Attention Encoder) series investigates the synergy between local feature extraction and global contextual modeling. The v1.1 release represents a significant architectural stabilization over the v1.0 baseline. By reconfiguring the layer distribution to a symmetric 4+4 structure and implementing robust normalization techniques, we demonstrate a measurable improvement in semantic representation, achieving a Spearman correlation of 0.656 on the STS Benchmark and an NDCG@10 of 0.413 on the SciFact dataset with only 21.1 million parameters.

## 2. Introduction and Motivation

Contemporary text embedding models often rely on pure Self-Attention mechanisms (Transformers), which, while powerful, exhibit quadratic complexity and can be parameter-inefficient when deployed at sub-100M scales for specific retrieval tasks. HCAE v1.1 addresses these constraints by leveraging Depthwise Separable Convolutions in the initial stages to capture local structural dependencies, followed by Self-Attention blocks to refine global semantic relations. This hybrid approach significantly reduces the computational overhead while maintaining high fidelity in the embedding space.

## 3. Architectural Refinement

The transition from v1.0 to v1.1 involved several critical design decisions aimed at improving gradient flow and representational capacity:

### 3.1 Symmetric Layer Distribution

In HCAE v1.1, we transitioned from a 5-layer Convolution / 3-layer Attention split to a symmetric **4+4 configuration**. This adjustment ensures that the model devotes sufficient capacity to both low-level linguistic features (phonetic/syntactic patterns) and high-level semantic abstractions.

### 3.2 Stability and Non-linearity

- **LayerScale Integration:** We implemented LayerScale with an initial value of 1e-5. This gating mechanism allows for deeper gradient penetration during the early phases of training, preventing the vanishing gradient issues common in hybrid models with heterogeneous layer types.

- **SwiGLU Activation:** Replacing standard GELU with SwiGLU (Shazeer, 2020) allowed the model to achieve more precise non-linear mapping. The gated linear unit structure provides a better approximation of complex semantic boundaries, which is reflected in the improved performance on the SciFact technical retrieval task.

## 4. Empirical Evaluation

The models were evaluated using the Massive Text Embedding Benchmark (MTEB) across several key dimensions: Semantic Textual Similarity (STS) and Information Retrieval.

### 4.1 Performance Analysis (STSBenchmark)

HCAE v1.1-Instruct achieved a **0.656 Spearman coefficient**, representing an 11% relative improvement over the v1.0 baseline (0.591). This gain suggests that the architectural refinements successfully resolved previous bottlenecks in linear semantic mapping.

### 4.2 Retrieval Performance (SciFact)

On the SciFact dataset, which requires high precision in scientific domain retrieval, HCAE v1.1-Instruct reached an **NDCG@10 of 0.413** and a **Recall@10 of 0.523**. For a 21M parameter model, this performance is highly competitive, approaching results typically seen in models with 100M+ parameters.

## 5. Training Methodology and Instruction Tuning

HCAE v1.1 utilizes a multi-stage curriculum learning approach:

1. **Base Pre-training:** Optimized for general-purpose semantic similarity using massive corpora.

2. **Instruction Tuning:** Fine-tuned on a curated set of NLI and domain-specific technical datasets (SciFact, Med-Tech).

3. **Task-Specific Prefixes:** Integration of `query:` and `passage:` instructions allows the model to differentiate between asymmetric roles in a retrieval pipeline, effectively orienting its vector space based on the user’s intent.

## 6. Implementation and Deployment

To ensure seamless integration with modern research workflows:

- **Serialization:** Models are provided in the `safetensors` format, ensuring rapid loading and enhanced security against arbitrary code execution.

- **Transformers API:** Native support for `AutoModel` is provided through a custom mapping, allowing for integration with a single line of code (`trust_remote_code=True`).

- **Standardized Tokenization:** Utilization of the BERT-base-uncased vocabulary ensures compatibility with existing pre-processing pipelines.

-–

**HeavensHackDev Research**

**Technical Pre-Release Note - HCAE v1.1**category is for sharing and discussing projects, showcasing your Spaces, Models, Datasets and more. We value open-source and technical details over promotional content, so focus on sharing the intricate aspects of your work.

1 Like