How to request mistral:7b-instruct to skip returning context?

akappa · February 21, 2025, 9:55am

Hi all, my first time around here, so please bear with me if I missed any rules/recommendations.

I have been learning to use and build projects using ollama with local model mistral:7b-instruct v3. This particular use case, I am asking the model to impersonate accessibility auditor to generate a report. I have used the model in other contexts as well.

Mistral continues to include the full context on the response with thousands and thousands of lines of "[3,1027,781,2744… ", no matter how I request on the prompt to exclude this. I can’t seem to understand if this is model related or ollama API related. This context increases the size of any logs by 10s of MBs and complicates troubleshooting.

Appreciate any help to understand how I can request the model/ollama to skip including the context in its response to my prompts.

ai.config.ts
export const AI_CONFIG: AIConfig = {
api: {
baseUrl: “http://localhost:11434”,
endpoints: {
generate: “/api/generate”
//embeddings: “/api/embeddings”,
},
},
model: {
name: “mistral:7b-instruct”,
parameters: {
chunkSize: 6000,
promptTimeout: 60000
},
},
prompts: {

},
// ...
retry: {
  attempts: 3,
  backoff: {
    initial: 1000,
    multiplier: 1.5,
    maxDelay: 10000,
  },
},

Ollama call:

private static async callOllama(prompt: string): Promise {
const body = {
prompt,
model: AI_CONFIG.model.name,
options: {
num_ctx: 8192,
},
stream: false,
};

const url = AI_CONFIG.api.baseUrl + AI_CONFIG.api.endpoints.generate;
const response = await fetch(url, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify(body),
});

if (!response.ok) {
  const errorText = await response.text();
  throw new Error(
    `Ollama request failed [${response.status}]: ${errorText}`
  );
}
return await response.text();

}

Prompt:
You are an expert accessibility auditor. Based on the aggregated axe-core results provided below, generate a comprehensive accessibility analysis report.

YOU MUST OUTPUT ONLY A SINGLE, STRICTLY VALID JSON OBJECT AND NOTHING ELSE. Do not include any extra text, commentary, or keys (such as “model”, “created_at”, “done”, “context”, or markdown formatting like code fences).

The JSON object MUST follow EXACTLY this structure:

{
“aggregatedAnalysis”: {
“contextualSummary”: “

”,
“prioritizedIssues”: [
{
“issue”: “”,
“wcagReference”: “<guideline reference(s)>”,
“remediation”: “”
}
]
}
}
ONLY OUTPUT THE JSON. NOTHING ELSE.
If you cannot produce output in this format exactly, output nothing.

Below is the aggregated axe-core accessibility report data:
<>

[LOG] Running AI Enhanced Analysis…
Raw AI response: {“model”:“mistral:7b-instruct”,“created_at”:“2025-02-20T01:11:46.979168Z”,“response”:" It seems like you have provided a JSON object that contains a list…“context”:[3,1027,781,2744,1228,1164,8351,3503,3800,5554,2610,29491,17926,1124,1040,15322,1369,6824,29474,29501,3059,3671,4625,4392,29493,9038,1032,16081,3503,3800,6411,3032,29491,4372,4593,1119,11848,1115,1032,3460,29493,20238,4484,10060,2696,1163,1476,4978,5285,1396,29493,1989,15526,29493,1210,8916,29491,29473,781,781,21966,4593,1032,10060,2696,1137,5436,1384,1431,1042,1066,1224,546…

John6666 · February 21, 2025, 1:31pm

I think it’s because of the model. If you change the model, you can isolate the problem…
If you want to output JSON, it might be easier to use this.

akappa · February 21, 2025, 10:02pm

Thanks for the response. I will read up on the structured outputs feature.

I passed the expected JSON schema I want within the prompt. I have written a whole set of instructions for the model to follow. However, the response is not consistent (as expected I guess). Keeping the context object aside, the whole JSON response parsing is a whole other thing I had to deal with. I wrote an entire parser to clean up the response I get from the model… maybe your post will address some of that, have to test it.

parser to clean up the response:

github.com/ganymedej3/project-bennu

src/ai/llm_response_parser.ts

main

export interface LLMTopLevelResponse {
  model: string;
  response: string;
  done: boolean;
}

/**
 * This parser tries to handle many "junk" patterns from the LLM:
 * - code fences or triple backticks
 * - line-based // comments
 * - partial text beyond the main JSON block
 * - bracket balancing for { ... } or [ ... ]
 * - invalid escapes like \_ or \* that break JSON
 */
export class LLMResponseParser {
  /**
   * Remove triple backticks or code fences like ```json
   */
  static removeCodeFences(input: string): string {
    return input.replace(/```(\S+)?/g, "").trim();

This file has been truncated. show original

Here are all my prompts:

github.com/ganymedej3/project-bennu

src/config/ai.config.ts

3722ef2f5


      
          Any extraneous text or explanation will break the parser. 
          IMPORTANT: Double-check that your JSON is valid. No trailing commas, no code fences, no line comments. No text after the final bracket.
          `,
          
          API_NEGATIVE_SCENARIOS: `
          You are an AI that enumerates negative or invalid scenarios for an API endpoint.
          Given:
          - Endpoint name or short summary: "<<ENDPOINT_DESCRIPTION>>"
          - The number of scenarios to generate: <<COUNT>>
          
          Return a JSON array of <<COUNT>> scenario objects. Each should describe:
          {
            "description": "short explanation of the invalid scenario",
            "payload": {...}, 
            "expectedStatus": 4xx or 5xx,
            "reason": "why it fails or is invalid"
          }
          (IMPORTANT: produce valid JSON, no code fences or extra text).
          (Important: use double quotes for all JSON fields, no single quotes!)
          (Important: Do NOT wrap the output in any markdown code fences or triple backticks. Just return plain valid JSON.)
          (Important: Only output ONE JSON block. Do NOT append extra braces or code fences. No additional text after the final '}'.)

Sample:

API_DATA_GENERATION: `
You are an AI that generates test data for a given API resource or endpoint.
Given:
- The endpoint name or resource type: "<<RESOURCE_NAME>>"
- The number of items to generate: <<COUNT>>

IMPORTANT INSTRUCTIONS for your output:
1) Output ONLY valid JSON. No markdown code fences or triple backticks.
2) No line-based comments like "// ...".
3) Do NOT add any text after the JSON. No explanations or extra commentary.
4) Return strictly one JSON array with <<COUNT>> items. For example:
[
  { "field1": "value", "field2": 123 },
  { "field1": "another", "field2": 456 }
]

Any extraneous text or explanation will break the parser. 
IMPORTANT: Double-check that your JSON is valid. No trailing commas, no code fences, no line comments. No text after the final bracket.
`,

benstokes · February 22, 2025, 8:46am

Thanks for sharing. It helps me a lot.
Best Regards!
Ig Likes

Topic		Replies	Views
Your LLaMA model is generating extra text before and after the expected JSON output, and it is not correctly evaluating responsesummary based on the specified factors: relevance and word count Intermediate	1	46	February 28, 2025
Truncated output on mistralai/Mistral-7B-Instruct-v0.1 Inference Endpoints on the Hub	4	1739	December 21, 2023
"Masking" the prompt / repeated portions Beginners	0	401	February 21, 2024
Inference mistral-7b instruct fully offline in Local machin Beginners	0	464	April 27, 2024
Endpoint not returning stop token on mistral models Inference Endpoints on the Hub	2	4360	October 27, 2023

How to request mistral:7b-instruct to skip returning context?

Related topics