Text Web UI Generation Blanks (goes Black) on a Character

Rivers-Kosmos · September 17, 2023, 1:28pm

After some conversation, about five minutes worth maybe more. This is what I am getting as a result. I have searched for this and have not found a solution. I am guessing it deals with the Character due to the fact if I go to another character. I do not have this issue until I speak to it for 5 minutes. I have messed with the “max_new_tokens” and it doesn’t clear the issue. I have to restart the server to conversate again but this will happen again with another 5 minutes of conversation.

I have done other experiments with the Llama.cpp loader as well as other playing around. I have changed the offloading to the GPU, batch and other parameters.

Please note that this does occur with 13B models as well.

I have not found someone speak specifically on this issue.

Any help with this?

CMD Prompt

2023-09-17 08:49:06 INFO:Saved C:\Users\natha\AI\text-generation-webui\presets\My Preset.yaml.
Llama.generate: prefix-match hit
Output generated in 15.69 seconds (2.42 tokens/s, 38 tokens, context 465, seed 193558016)
Llama.generate: prefix-match hit
Output generated in 11.16 seconds (2.33 tokens/s, 26 tokens, context 520, seed 683430275)
llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
Output generated in 0.27 seconds (0.00 tokens/s, 0 tokens, context 2058, seed 27346203)
2023-09-17 09:20:18 INFO:Loading TheBloke_Wizard-Vicuna-30B-Uncensored-GGML…
2023-09-17 09:20:18 INFO:llama.cpp weights detected: models\TheBloke_Wizard-Vicuna-30B-Uncensored-GGML\Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin
2023-09-17 09:20:18 INFO:Cache capacity is 0 bytes
llama.cpp: loading model from models\TheBloke_Wizard-Vicuna-30B-Uncensored-GGML\Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 6656
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 52
llama_model_load_internal: n_head_kv = 52
llama_model_load_internal: n_layer = 60
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 5.0e-06
llama_model_load_internal: n_ff = 17920
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 10 (mostly Q2_K)
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 0.16 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 13628.98 MB (+ 3120.00 MB per state)
llama_model_load_internal: offloading 0 repeating layers to GPU
llama_model_load_internal: offloaded 0/63 layers to GPU
llama_model_load_internal: total VRAM used: 592 MB
llama_new_context_with_model: kv self size = 3120.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
2023-09-17 09:20:19 INFO:Loaded the model in 1.36 seconds.

llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
llama_tokenize_with_model: too many tokens
Output generated in 0.25 seconds (0.00 tokens/s, 0 tokens, context 2066, seed 576701495)

Topic		Replies	Views
Text generation using LLAMA3 Beginners	0	836	July 24, 2024
Stopping generation before max_new_tokens 🤗Transformers	0	791	June 1, 2023
Llama 2 10x slower than LLaMA 1 🤗Transformers	1	724	November 7, 2023
After conversing with AI, Text goes black Beginners	0	143	August 31, 2023
Issue with LlaMA-2 Chat Template (and out of date documentation) Models	7	11936	November 10, 2023

Text Web UI Generation Blanks (goes Black) on a Character

Related topics