Possible to avoid new lines in output?

I am using Hugging Face and SageMaker - overriding defaults using the inferences.py file in the code/ directory as explained at the bottom of this page here. My output is written in formatted JSON with new lines but the newline characters are causing an error downstream. Is it possible to write the output all in one line?

Transformer code

transformer = Transformer(
    model_name=step_create_model.properties.ModelName,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    output_path=output_uri
)

step_transform = TransformStep(
    name="NLPTransform",
    transformer=transformer,
    inputs = TransformInput(
            data=step_process.properties.ProcessingOutputConfig.Outputs[
                "batch_input"
            ].S3Output.S3Uri,
        ),
    depends_on=['NLPProcess']
)

Here’s the current output

 [
        {
             "input": "test input",
             "prediction": "other"
        }
 ]

And here’s the desired output

[{“input”: “test input”, “prediction”: “other”}]

The output_fn below

def output_fn(preds_inputs, accept):
    predictions = preds_inputs['pred']
    inputs = preds_inputs['inpts']
    outputs = []
    for pred, inpt in zip(predictions, inputs):
        outputs.append({'input': inpt, 'prediction': pred})
    return outputs

Sorry, i do not fully understand what the error is you see and what you are trying to do.

I am expecting the pipeline output to look like this

[{“input”: “test input”, “prediction”: “other”}]

but instead it looks like this

 [
        {
             "input": "test input",
             "prediction": "other"
        }
 ]

The error is from further downstream (another system that I do not control) that does not handle new lines in the files it ingests.

@bluePenguin It is hard to help you if i don’t know the error or cannot reproduce on our side.

But from what you shared it seems that you are struggling to write a correct jsonlines file. You can checkout this example: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub
Where we are converting a CSV to a jsonline file with the output you want

Thank you @philschmid and apologies for not having the error message to show here. In the notebook, I see there is a code snippet at the end to download the output of the transformer and format it correctly. Would you happen to know if it is possible to add such a snippet to the end of a SageMaker Pipeline so that it runs automatically?

Yes you could do this with a LambdaStep.

1 Like