How does the GPT-J inference API work?

wittyseller · September 29, 2021, 7:21am

It’s strange: if I send only one sentence as input, it works:

So this is my request:
{
“inputs”: “Nowadays, many users would like to upgrade old hard drive to SSD with Windows installed, or reinstall Windows 10 on SSD afterward.”,
“parameters”: {
“return_full_text”: false,
“max_new_tokens”: 100,
“temperature”: 0.8
},
“options”: {
“use_cache”: false
}
}

And I get back this:

HTTP/1.1 200 OK
date: Wed, 29 Sep 2021 07:06:20 GMT,Wed, 29 Sep 2021 07:06:29 GMT
server: istio-envoy
x-compute-time: 2.1995999999999998
x-compute-type: gpu
access-control-expose-headers: x-compute-type, x-compute-time
x-compute-characters: 130
content-length: 130
content-type: application/json
x-envoy-upstream-service-time: 2244

[{“generated_text”:" In this way, they can enjoy the performance of SSD and the convenience of Windows. In the first place, the"}]

To me, it looks like there is some sort of a limit on the input tokens

Topic		Replies	Views
Default gpt-j output length Beginners	0	363	April 23, 2022
Change length of GPT-neo output Beginners	6	1882	June 10, 2021
How to return more tokens when calling the inference end point? Inference Endpoints on the Hub	4	1511	May 9, 2024
Unable to generate more than one token at a time using website API Inference Endpoints on the Hub	1	293	November 29, 2023
I am unable to adjust the generated text length Beginners	8	494	September 26, 2024

How does the GPT-J inference API work?

Related topics