Description
PreTrain language model for automatic generation of messages to commits.
Version control systems are used in the development of many projects, so the tool can be relevant for a wide range of developers.
Model
Here we need to use the Text2Text/Sequence2Sequence or Text Generation model. Better to discuss it all together.
Dataset
Not sure that there are existing datasets for us. But we can get everything from GitHub API.
How to collect data
Get all branches from the repository:
https://api.github.com/repos/huggingface/transformers/branches?per_page=100&page=2
Result (master
as example):
[...
{
"name": "master",
"commit": {
"sha": "3ff2cde5ca4a2d3c622b827d9edf7e3d0b7f4fb7",
"url": "https://api.github.com/repos/huggingface/transformers/commits/3ff2cde5ca4a2d3c622b827d9edf7e3d0b7f4fb7"
},
"protected": true
},
...
]
Get all commits from the branch:
-
page=1
andsha=7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa
as example
https://api.github.com/repos/huggingface/transformers/commits?per_page=100&sha=3ff2cde5ca4a2d3c622b827d9edf7e3d0b7f4fb7&page=1
Result:
An array of commits. Json of each commit is huge, so I will not paste it. From each commit we need a url
.
https://api.github.com/repos/huggingface/transformers/commits/24cbf6bc5a0b6a9bb5afdda6bb1a329ac980fa4b
Get commit message and patches:
- Message -
$.commit.message
- Patches - array of changed files
$.files
, where each file has$.patch
Patch example:
@@ -193,7 +193,7 @@ It is recommended to pre-train Wav2Vec2 with Trainer + Deepspeed (please refer t\n Here is an example of how you can use DeepSpeed ZeRO-2 to pretrain a small Wav2Vec2 model:\n \n ```\n-PYTHONPATH=../../../src deepspeed --num_gpus 2 run_pretrain.py \\\n+PYTHONPATH=../../../src deepspeed --num_gpus 4 run_pretrain.py \\\n --output_dir=\"./wav2vec2-base-libri-100h\" \\\n --num_train_epochs=\"3\" \\\n --per_device_train_batch_size=\"32\" \\
Training scripts
I think we would use our own script for this project. As well, we can refer to any example script.
Expected result
Ready to go model. So we can integrate it with source code editors and browser extensions.
About
I am really interested in this project. I hope to find like-minded people and create a cool project
Even if you don’t want to participate - like and reply to this topic so that more people will see it!