Better generated tokens from GPT2

System Setup

  • Pop!_OS 20.04
  • Pytorch: 1.5.1
  • Transformers: 3.0.2
  • Python: 3.7.6

Background Code

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

prompt1 = 'We present an update on the results of the Double Chooz experiment. Double Chooz searches for the neutrino mixing angle, θ13, in the three-neutrino mixing matrix via the disappearance of produced by the dual 4.27 GW/th Chooz B Reactors. Here we discuss updated oscillation fit results using both the rate and the shape of the anti-neutrino energy spectrum. In the most recent oscillation analysis we included data with neutron captures on Gadolinium and Hydrogen along with the reactor off data that we collected. This is an important step in our multi-year program to establish the value of θ13.'
prompt2 = 'The paper covers detailed discussion on novel control system developed for adaptive fluid-based shock-absorbers serving for mitigation of unknown impact excitations. In order to provide complete independence of the control system from the loading conditions, the Hybrid Prediction Control (HPC) was elaborated. The proposed method is an extension of previously introduced kinematic feedback control which ensures optimal path finding, tracking and path update in case of high disturbance or sudden change of loading conditions. Implementation of the presented control system allows to obtain self-adaptive fluid-based absorbers providing robust impact mitigation. In contrast to previously developed methods of Adaptive Impact Absorption, the proposed control strategy does not require prior knowledge of impact excitation or its preliminary identification. The independence of applied control system from parameters of impact loading results in the capability of automatic path correction in the case of disturbance occurrence and re-adaptation to a number of subsequent impacts. The successful operation of the self-adaptive system is investigated with the use of numerical examples involving double-chamber pneumatic shock-absorber equipped with controllable valve. Efficiency of the HPC is proved by comparison with passive absorber as well as device equipped with adaptive and optimal control modules.'
prompt3 = 'This study aimed to produce biosurfactant from Pseudozyma tsukubaensis using cassava wastewater and an inoculum (biomass) for galactooligosaccharides synthesis from lactose as an integrated system. First, the use of cassava wastewater as a low cost culture medium by P. tsukubaensis to produce biomass and biosurfactant was evaluated and optimized. Then, the microbial cells (biomass) obtained from the optimized process were used to produce galactooligosaccharides from lactose. The optimum conditions for biosurfactant and biomass synthesis were found to be 80% (v/v) of cassava wastewater at 30°C and 200rpm for 48h. The highest concentration of biosurfactant, that is, minimum surface tension value and maximum biomass concentration predicted were experimentally confirmed as 26.87mN/m and 10.5g/L, respectively. The biosurfactant obtained showed good thermal (121°C/1h), pH (2–11) and ionic strength (0–25% NaCl) stability. Excellent emulsifier activity was also verified, suggesting a potential application in enhanced oil recovery. Galactooligosaccharides synthesized by the Kluyveromyces genus have been extensively investigated, however, few studies have reported transgalactosylation ability by other yeast genera. The transgalactosylation activity of the yeast biomass at optimized conditions from 40% (w/w) lactose resulted in galactooligosaccharides production of 73.12g/L and a yield of 18.28% (w/w) at pH 8.0 and 30°C in 24h. This research showed the technical feasibility of an integrated process: biosurfactant and GOS production from P. tsukubaensis, which takes advantage of the remarkable metabolism of this microorganism. To the best of our knowledge, this is the first study reporting the potential of P. tsukubaensis to produce two economical biotechnological products of increase interest as an integrated process.'
prompt4 = 'Advantages of a fuzzy predictive control algorithm are discussed in the paper. The fuzzy predictive algorithm is a combination of a DMC (Dynamic Matrix Control) algorithm and Takagi–Sugeno fuzzy modeling, thus it inherits advantages of both techniques. The algorithm is numerically effective. It is in fact generalization of the standard DMC algorithm widely used in the industry, thus the existing implementations of the DMC algorithm can be extended using the presented fuzzy approach. A simple and easy to apply method of fuzzy predictive control algorithms synthesis is presented in the paper. It can be easy applied also in the case of Multiple Input Multiple Output (MIMO) control plants. Moreover, information about measured disturbance can be included in the algorithms in an easy way. The advantages of the fuzzy predictive control algorithm are demonstrated in the example control systems of two nonlinear chemical reactors: the first one—with inverse response and the second one—a MIMO plant with time delay.'
batch = [prompt1, prompt2, prompt3, prompt4]

tokenizer = GPT2Tokenizer.from_pretrained('gpt2', padding_side='right')
tokenizer.pad_token = tokenizer.eos_token
encoded_results = tokenizer(batch, padding=True, truncation=True, return_tensors='pt', return_attention_mask=True)

gpt2 = GPT2LMHeadModel.from_pretrained('gpt2')

temperature = 0.92
tmp_input_ids = encoded_results['input_ids']
tmp_attention_mask = encoded_results['attention_mask']
max_gen_length = 30
counter = 0
gen_dict = {'a1': '', 'a2': '', 'a3': '', 'a4': ''}
while counter < max_gen_length:
    outputs = gpt2(input_ids=tmp_input_ids,
                   attention_mask=tmp_attention_mask)

    # (batch_size, sequence_length, vocab_size)
    lm_logits_w_temp = outputs[0] / temperature

    # (batch_size, vocab_size)
    last_tokens = lm_logits_w_temp[:, -1, :]
    last_token_softmaxes = torch.softmax(last_tokens, dim=-1).squeeze()
    next_tokens = torch.multinomial(last_token_softmaxes, num_samples=1)

    next_strs = [tokenizer.decode(next_token).strip() for next_token in next_tokens]
    prev_input_strs = [tokenizer.decode(id_tensor, skip_special_tokens=True) for id_tensor in tmp_input_ids]
    prev_split_list = [prev_input_str.split() for prev_input_str in prev_input_strs]

    gen_dict['a1'] += next_strs[0] + ' '
    gen_dict['a2'] += next_strs[1] + ' '
    gen_dict['a3'] += next_strs[2] + ' '
    gen_dict['a4'] += next_strs[3] + ' '
    str_list_to_join = []
    for ii, prev_split2 in enumerate(prev_split_list):
        next_str = next_strs[ii]
        tmp_prev = prev_split2
        tmp_prev.append(next_str)
        str_list_to_join.append(tmp_prev)
    next_inputs = [' '.join(str_to_join) for str_to_join in str_list_to_join]

    if counter == max_gen_length - 1:
        final_str_batch = next_inputs
    else:
        new_encoded_results = tokenizer(next_inputs, padding=True, truncation=True, return_tensors='pt',
                                         return_attention_mask=True)
        tmp_input_ids = new_encoded_results['input_ids']
        tmp_attention_mask = new_encoded_results['attention_mask']

    counter += 1


print('Generated by GPT2:')
for k, v in gen_dict.items():
    print('{}: {}'.format(k, v))

print('\nNew abstracts (old+generated):')
for final_str in final_str_batch:
    print(final_str)

Question
I was wondering if there were ways to make GPT2 generate better tokens? In my code, I’m using temperature set to an arbitrary value. Here are the printed results:

Generated by GPT2:
a1: About Ag J An The What SHARE You ... Ge M By £ What May SC " Ex ia The Still End Turkey The Hi A Late Army ________ Here 
a2: The P In This The This Google [ Five M You Uber Sit Re In Story So The Super Marvel Yet Jul Get An A There What L The Ru 
a3: Our bro il ousing method did not result in exclusion of PCR components from trans g alling . Repe ating to horizontal prep ot of PCR with b ic ulations 
a4: From Beaut A Mid F 2 Donald The Mel From Come T The By Act IF The Pin It Whenever This This Top The " ​ I It But Who 

New abstracts (old+generated):
We present an update on the results of the Double Chooz experiment. Double Chooz searches for the neutrino mixing angle, θ13, in the three-neutrino mixing matrix via the disappearance of produced by the dual 4.27 GW/th Chooz B Reactors. Here we discuss updated oscillation fit results using both the rate and the shape of the anti-neutrino energy spectrum. In the most recent oscillation analysis we included data with neutron captures on Gadolinium and Hydrogen along with the reactor off data that we collected. This is an important step in our multi-year program to establish the value of θ13. About Ag J An The What SHARE You... Ge M By £ What May SC " Ex ia The Still End Turkey The Hi A Late Army ________ Here
The paper covers detailed discussion on novel control system developed for adaptive fluid-based shock-absorbers serving for mitigation of unknown impact excitations. In order to provide complete independence of the control system from the loading conditions, the Hybrid Prediction Control (HPC) was elaborated. The proposed method is an extension of previously introduced kinematic feedback control which ensures optimal path finding, tracking and path update in case of high disturbance or sudden change of loading conditions. Implementation of the presented control system allows to obtain self-adaptive fluid-based absorbers providing robust impact mitigation. In contrast to previously developed methods of Adaptive Impact Absorption, the proposed control strategy does not require prior knowledge of impact excitation or its preliminary identification. The independence of applied control system from parameters of impact loading results in the capability of automatic path correction in the case of disturbance occurrence and re-adaptation to a number of subsequent impacts. The successful operation of the self-adaptive system is investigated with the use of numerical examples involving double-chamber pneumatic shock-absorber equipped with controllable valve. Efficiency of the HPC is proved by comparison with passive absorber as well as device equipped with adaptive and optimal control modules. The P In This The This Google [ Five M You Uber Sit Re In Story So The Super Marvel Yet Jul Get An A There What L The Ru
This study aimed to produce biosurfactant from Pseudozyma tsukubaensis using cassava wastewater and an inoculum (biomass) for galactooligosaccharides synthesis from lactose as an integrated system. First, the use of cassava wastewater as a low cost culture medium by P. tsukubaensis to produce biomass and biosurfactant was evaluated and optimized. Then, the microbial cells (biomass) obtained from the optimized process were used to produce galactooligosaccharides from lactose. The optimum conditions for biosurfactant and biomass synthesis were found to be 80% (v/v) of cassava wastewater at 30°C and 200rpm for 48h. The highest concentration of biosurfactant, that is, minimum surface tension value and maximum biomass concentration predicted were experimentally confirmed as 26.87mN/m and 10.5g/L, respectively. The biosurfactant obtained showed good thermal (121°C/1h), pH (2–11) and ionic strength (0–25% NaCl) stability. Excellent emulsifier activity was also verified, suggesting a potential application in enhanced oil recovery. Galactooligosaccharides synthesized by the Kluyveromyces genus have been extensively investigated, however, few studies have reported transgalactosylation ability by other yeast genera. The transgalactosylation activity of the yeast biomass at optimized conditions from 40% (w/w) lactose resulted in galactooligosaccharides production of 73.12g/L and a yield of 18.28% (w/w) at pH 8.0 and 30°C in 24h. This research showed the technical feasibility of an integrated process: biosurfactant and GOS production from P. tsukubaensis, which takes advantage of the remarkable metabolism of this microorganism. To the best of our knowledge, this is the first study reporting the potential of P. tsukubaensis to produce two economical biotechnological products of increase interest as an integrated process. Our bro il ousing method did not result in exclusion of PCR components from trans g alling. Repe ating to horizontal prep ot of PCR with b ic ulations
Advantages of a fuzzy predictive control algorithm are discussed in the paper. The fuzzy predictive algorithm is a combination of a DMC (Dynamic Matrix Control) algorithm and Takagi–Sugeno fuzzy modeling, thus it inherits advantages of both techniques. The algorithm is numerically effective. It is in fact generalization of the standard DMC algorithm widely used in the industry, thus the existing implementations of the DMC algorithm can be extended using the presented fuzzy approach. A simple and easy to apply method of fuzzy predictive control algorithms synthesis is presented in the paper. It can be easy applied also in the case of Multiple Input Multiple Output (MIMO) control plants. Moreover, information about measured disturbance can be included in the algorithms in an easy way. The advantages of the fuzzy predictive control algorithm are demonstrated in the example control systems of two nonlinear chemical reactors: the first one—with inverse response and the second one—a MIMO plant with time delay. From Beaut A Mid F 2 Donald The Mel From Come T The By Act IF The Pin It Whenever This This Top The " ​ I It But Who

Here are the same inputs, but using a greedy search (next_tokens = torch.argmax(last_token_softmaxes, dim=-1).tolist()):

Generated by GPT2:
a1: The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The 
a2: The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The 
a3:  Â  Â The results of this study are in agreement with the results of previous studies , which have shown that the bios ur fact ant if the 
a4: The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The 

New abstracts (old+generated):
We present an update on the results of the Double Chooz experiment. Double Chooz searches for the neutrino mixing angle, θ13, in the three-neutrino mixing matrix via the disappearance of produced by the dual 4.27 GW/th Chooz B Reactors. Here we discuss updated oscillation fit results using both the rate and the shape of the anti-neutrino energy spectrum. In the most recent oscillation analysis we included data with neutron captures on Gadolinium and Hydrogen along with the reactor off data that we collected. This is an important step in our multi-year program to establish the value of θ13. The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The
The paper covers detailed discussion on novel control system developed for adaptive fluid-based shock-absorbers serving for mitigation of unknown impact excitations. In order to provide complete independence of the control system from the loading conditions, the Hybrid Prediction Control (HPC) was elaborated. The proposed method is an extension of previously introduced kinematic feedback control which ensures optimal path finding, tracking and path update in case of high disturbance or sudden change of loading conditions. Implementation of the presented control system allows to obtain self-adaptive fluid-based absorbers providing robust impact mitigation. In contrast to previously developed methods of Adaptive Impact Absorption, the proposed control strategy does not require prior knowledge of impact excitation or its preliminary identification. The independence of applied control system from parameters of impact loading results in the capability of automatic path correction in the case of disturbance occurrence and re-adaptation to a number of subsequent impacts. The successful operation of the self-adaptive system is investigated with the use of numerical examples involving double-chamber pneumatic shock-absorber equipped with controllable valve. Efficiency of the HPC is proved by comparison with passive absorber as well as device equipped with adaptive and optimal control modules. The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The
This study aimed to produce biosurfactant from Pseudozyma tsukubaensis using cassava wastewater and an inoculum (biomass) for galactooligosaccharides synthesis from lactose as an integrated system. First, the use of cassava wastewater as a low cost culture medium by P. tsukubaensis to produce biomass and biosurfactant was evaluated and optimized. Then, the microbial cells (biomass) obtained from the optimized process were used to produce galactooligosaccharides from lactose. The optimum conditions for biosurfactant and biomass synthesis were found to be 80% (v/v) of cassava wastewater at 30°C and 200rpm for 48h. The highest concentration of biosurfactant, that is, minimum surface tension value and maximum biomass concentration predicted were experimentally confirmed as 26.87mN/m and 10.5g/L, respectively. The biosurfactant obtained showed good thermal (121°C/1h), pH (2–11) and ionic strength (0–25% NaCl) stability. Excellent emulsifier activity was also verified, suggesting a potential application in enhanced oil recovery. Galactooligosaccharides synthesized by the Kluyveromyces genus have been extensively investigated, however, few studies have reported transgalactosylation ability by other yeast genera. The transgalactosylation activity of the yeast biomass at optimized conditions from 40% (w/w) lactose resulted in galactooligosaccharides production of 73.12g/L and a yield of 18.28% (w/w) at pH 8.0 and 30°C in 24h. This research showed the technical feasibility of an integrated process: biosurfactant and GOS production from P. tsukubaensis, which takes advantage of the remarkable metabolism of this microorganism. To the best of our knowledge, this is the first study reporting the potential of P. tsukubaensis to produce two economical biotechnological products of increase interest as an integrated process.   The results of this study are in agreement with the results of previous studies, which have shown that the bios ur fact ant if the
Advantages of a fuzzy predictive control algorithm are discussed in the paper. The fuzzy predictive algorithm is a combination of a DMC (Dynamic Matrix Control) algorithm and Takagi–Sugeno fuzzy modeling, thus it inherits advantages of both techniques. The algorithm is numerically effective. It is in fact generalization of the standard DMC algorithm widely used in the industry, thus the existing implementations of the DMC algorithm can be extended using the presented fuzzy approach. A simple and easy to apply method of fuzzy predictive control algorithms synthesis is presented in the paper. It can be easy applied also in the case of Multiple Input Multiple Output (MIMO) control plants. Moreover, information about measured disturbance can be included in the algorithms in an easy way. The advantages of the fuzzy predictive control algorithm are demonstrated in the example control systems of two nonlinear chemical reactors: the first one—with inverse response and the second one—a MIMO plant with time delay. The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The

As can be seen the torch.multinomial approach produces better tokens than the torch.argmax approach. Yet, the better tokens still don’t make a whole lot of sense.

Maybe the generate method from GenerationMixin works better? I also suppose that the prompts could be too out of vocab, seeing as they are scientific article abstracts. But maybe not?

Thanks in advance for your help!

1 Like

The guide How to generate text: using different decoding methods for language generation with Transformers from @patrickvonplaten helped me understand different generation strategies.

2 Likes