Hey,
I’ve got a dataset loading script inheriting from the GeneratorBasedBuilder.
I want to load local preprocessed data from different folders. Since the dataset loading script is cached I have to specify the full absolute path in the script, which is obviously not a good solution.
When I try to get the current path the path to the cached script is used, which fails to load the data.
I also tried using data_dir and other options, nut none worked. How can i get the current dir in the script?
class MyDataset(datasets.GeneratorBasedBuilder):
...
def _split_generators(self, dl_manager: datasets.DownloadManager) -> List[datasets.SplitGenerator]:
# currentpath = os.path.abspath(os.getcwd()) #TODO resolve path auto
# also tried os.path.abspath(__file__)
currentpath = "/my/absolute/path/"
generator = []
file_train = os.path.join(currentpath, self.config.name, "train.csv")
file_test = os.path.join(currentpath, self.config.name, "test.csv")
file_eval = os.path.join(currentpath, self.config.name, "valid.csv")
if os.path.isfile(file_train):
train = datasets.SplitGenerator(
name=datasets.Split.TRAIN,
gen_kwargs={
"filepath": file_train,
"split": "train",
},
)
generator.append(train)
Thanks!