Error: Could not convert to integer: 3221225477. Path 'exitCode'

kitkat1000 · November 7, 2024, 11:00am

Hello. I want to create my own AI based on Meta AI, but when training a chatbot using datasets, I encountered the following problem: Could not convert to integer: 3221225477. Path ‘exitCode’. Tell me how to solve it?

I am using:

Windows 11 Home version 23H2
Microsoft Visual Studio 2022
Python 3.11.0

from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
import torch
from datasets import load_dataset, concatenate_datasets

try:
    # Download tokenizer and model
    model_path = "C:\\Users\\evhac\\.llama\\checkpoints\\Llama3.1-8B-hf"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path)

    # Setting eos_token as pad_token
    tokenizer.pad_token = tokenizer.eos_token

    # Transfer the model to GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    # Loading datasets
    persona_chat_dataset = load_dataset("AlekseyKorshuk/persona-chat", split="train")
    dailydialog_dataset = load_dataset("roskoN/dailydialog", split="train")

    # Transforming datasets
    def preprocess_persona(example):
        dialogue = example.get("dialogue", [])
        return {"dialogue": dialogue}

    def preprocess_dailydialog(example):
        dialogue = example.get("dialogue", [])
        return {"dialogue": dialogue}

    persona_chat_dataset = persona_chat_dataset.map(preprocess_persona, remove_columns=persona_chat_dataset.column_names)
    dailydialog_dataset = dailydialog_dataset.map(preprocess_dailydialog, remove_columns=dailydialog_dataset.column_names)
    combined_dataset = concatenate_datasets([persona_chat_dataset, dailydialog_dataset])

    # Transforming Dialogues
    def preprocess_dialogue(example):
        conversation = ""
        for turn in example["dialogue"]:
            if 'role' in turn and 'text' in turn:
                conversation += f"{turn['role']}: {turn['text']} \n"
        return {"text": conversation}

    processed_dataset = combined_dataset.map(preprocess_dialogue)

    # Tokenization and tagging
    def preprocess_for_model(example):
        # Tokenize text and trim to fit model requirements
        tokenized = tokenizer(example["text"], truncation=True, padding="max_length", max_length=256)
        
        # Add labels, which should be the same as the input_ids for training
        tokenized["labels"] = tokenized["input_ids"].copy()
        return tokenized

    # Applying tokenization to the processed dataset
    processed_dataset = processed_dataset.map(preprocess_for_model, batched=True)

    # Training parameters
    training_args = TrainingArguments(
        output_dir="./llama-chatbot",
        num_train_epochs=1,
        per_device_train_batch_size=2,
        save_steps=500,
        save_total_limit=2,
        fp16=False,
        remove_unused_columns=False
    )

    # Creating and running Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=processed_dataset,
    )
    
    trainer.train()

except Exception as e:
    print("An error has occurred:", e)

John6666 · November 7, 2024, 11:52am

It looks like there is an error in the VC DLL, but I’m not sure where the error is actually occurring…

github.com/libvips/pyvips

could not convert to integer . Path ' exitCode'. value too big or too small for Int32

opened 12:37PM - 28 Sep 23 UTC

closed 12:49PM - 27 Oct 23 UTC

sgtkellox

Hello, I use your nice library to stitch some jpg tiles to a big tiff. On most… cases, it works fine. However sometimes the tiff generated is of size 0 bytes. If i run my code from command line, no error is produced. If I run it from visual studio, I get the following Error: ``` could not convert to integer . Path ' exitCode'. value too big or too small for Int32 ``` Here is the core of the program I wrote: ``` import os import numpy as np add_dll_dir = getattr(os, "add_dll_directory", None) vipsbin = r"C:\AI\vips-dev-8.14\bin" if callable(add_dll_dir): add_dll_dir(vipsbin) else: os.environ["PATH"] = os.pathsep.join((vipsbin, os.environ["PATH"])) format_to_dtype = { 'uchar': np.uint8, 'char': np.int8, 'ushort': np.uint16, 'short': np.int16, 'uint': np.uint32, 'int': np.int32, 'float': np.float32, 'double': np.float64, 'complex': np.complex64, 'dpcomplex': np.complex128, } # map np dtypes to vips dtype_to_format = { 'uint8': 'uchar', 'int8': 'char', 'uint16': 'ushort', 'int16': 'short', 'uint32': 'uint', 'int32': 'int', 'float32': 'float', 'float64': 'double', 'complex64': 'complex', 'complex128': 'dpcomplex', } import pyvips #list the tiles to be stitched imgs = os.listdir(pathToMyTiles) #slide dimensions are extracted from the filenames of the tiles slideWidth,slideHeight = getWidthAndHeight(imgs) result = pyvips.Image.black(slideWidth,slideHeight,bands=3) print("result format " + result.format) for img in imgs: print("-----------------") tile = pyvips.Image.new_from_file(img, access='sequential') mem_img = tile.write_to_memory() print("tile format " + str(tile.format)) print("tile dtype " + str(format_to_dtype[tile.format])) #convert to numpy np_3d = np.ndarray(buffer=mem_img,dtype=format_to_dtype[tile.format],shape=[tile.height, tile.width, tile.bands]) height, width, bands = np_3d.shape linear = np_3d.reshape(width * height * bands) print("Numpy array DataType " + str(np_3d.dtype)) print("numpy array format format "+ str(dtype_to_format[str(np_3d.dtype)])) #make some changes np_3d[:,:,0] = 220 #reconvert to vips image vi = pyvips.Image.new_from_memory(linear.data, width, height, bands, dtype_to_format[str(np_3d.dtype)]) print("format after going back to vips " + str(vi.format)) #tile position is extracted from the filename absX,absY = extractTileCoordinates(img) result = result.insert(vi,absX,absY) result.tiffsave(safePath, compression=pyvips.enums.ForeignTiffCompression.DEFLATE, tile=True, tile_width=512, tile_height=512, #rgbjpeg=True, pyramid=True, bigtiff=True) print("-----------------") ``` Since i suspected this to be some sort of datatype mismatch between my tiles, the numpy part or the final tiff, I printed those datatypes for inspection. Here is the output of a failed example. It is the same for all tiles in the tile folder. It seems all fine to me. ``` result format uchar ----------------- tile format uchar tile dtype <class 'numpy.uint8'> Numpy array DataType uint8 numpy array format format uchar format after going back to vips uchar ----------------- ``` Also maybe I should note, that my tiffs are really big (190k x 100k). If possible, I would appreciate some advide on this problem. Thanks in advance, Felix

Topic		Replies	Views
Error when finetuning pretrained huggingface conv-ai chatbot model 🤗Transformers	2	817	April 19, 2021
AutoNLP backend error in loading CSV file => error 427 - not numeric Beginners	0	331	January 18, 2022
Loading a fine tuned model from disk fails 🤗Transformers	0	1161	November 13, 2021
Why do I get this error running tokenizer? Beginners	6	17906	August 20, 2020
AutoNLP Error for Entity Dataset 🤗AutoTrain	0	1000	February 10, 2022

Error: Could not convert to integer: 3221225477. Path 'exitCode'

Related topics