Cannot get stable relevant responses from local LLMs for simple summarization prompt, what am I missing?

I’m trying to run ollama with different models, my goal is to make a summary of some podcast transcription.

My prompt is next:


    You are a professional summarizer, you help people to get insights out of podcasts episodes to save their time, to deliver them the value of the audio with text. You only speak JSON, do not write normal text.
    
    You need to extract main takeaways from the audio. Each takeaway should have a title, description and timestamp when it was discussed. Each takeaway should have 3 additional extra points with more details, each extra point should be a sentence that expands the takeaway more deeply.
    
    00:00:00: line 1
    00:00:02: line 2

I also tried it in the next format:

    <s>[INST] <<SYS>>
    MY_PROMPT
    <</SYS>>
    
    MY_TRANSCRIPT
    [/INST]

I tried running it through ollama CLI and also through Open WebUI.

I tried next models: llama2:7b, llama2:13b, mixtral:8x7b, mixtral:8x7b-instruct-v0.1-q5_1. All of them don’t return the proper response. They don’t return JSON, they return irrelevant things, sometimes they respond with piece of transcript, sometimes they talk about some “article” and etc. They don’t follow the instructions at all.

I tried similar models (mistralai/Mixtral-8x7B-Instruct-v0.1) in huggingface chat and it always works stable giving me good quality responses in JSON format. Same prompt perfectly works in ChatGPT3.5, ChatGPT4, Gemini, Claude, basically anywhere on public online services.

What am I doing wrong? Thanks in advance for any tips :pray: