Possible to avoid new lines in output?

bluePenguin · August 9, 2022, 4:19pm

I am using Hugging Face and SageMaker - overriding defaults using the inferences.py file in the code/ directory as explained at the bottom of this page here. My output is written in formatted JSON with new lines but the newline characters are causing an error downstream. Is it possible to write the output all in one line?

Transformer code

transformer = Transformer(
    model_name=step_create_model.properties.ModelName,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    output_path=output_uri
)

step_transform = TransformStep(
    name="NLPTransform",
    transformer=transformer,
    inputs = TransformInput(
            data=step_process.properties.ProcessingOutputConfig.Outputs[
                "batch_input"
            ].S3Output.S3Uri,
        ),
    depends_on=['NLPProcess']
)

Here’s the current output

 [
        {
             "input": "test input",
             "prediction": "other"
        }
 ]

And here’s the desired output

[{“input”: “test input”, “prediction”: “other”}]

The output_fn below

def output_fn(preds_inputs, accept):
    predictions = preds_inputs['pred']
    inputs = preds_inputs['inpts']
    outputs = []
    for pred, inpt in zip(predictions, inputs):
        outputs.append({'input': inpt, 'prediction': pred})
    return outputs

philschmid · August 10, 2022, 7:19am

Sorry, i do not fully understand what the error is you see and what you are trying to do.

bluePenguin · August 10, 2022, 2:40pm

I am expecting the pipeline output to look like this

[{“input”: “test input”, “prediction”: “other”}]

but instead it looks like this

 [
        {
             "input": "test input",
             "prediction": "other"
        }
 ]

The error is from further downstream (another system that I do not control) that does not handle new lines in the files it ingests.

philschmid · August 10, 2022, 2:56pm

@bluePenguin It is hard to help you if i don’t know the error or cannot reproduce on our side.

But from what you shared it seems that you are struggling to write a correct jsonlines file. You can checkout this example: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub
Where we are converting a CSV to a jsonline file with the output you want

bluePenguin · August 10, 2022, 3:01pm

Thank you @philschmid and apologies for not having the error message to show here. In the notebook, I see there is a code snippet at the end to download the output of the transformer and format it correctly. Would you happen to know if it is possible to add such a snippet to the end of a SageMaker Pipeline so that it runs automatically?

philschmid · August 10, 2022, 3:25pm

Yes you could do this with a LambdaStep.

Topic		Replies	Views
ClientErro:400 when using batch transformer for inference Amazon SageMaker	11	2220	January 13, 2022
How to modify output_fn for inference.py? Amazon SageMaker	4	1441	January 26, 2023
ModelError when I run predict after deploying wizardcoder for text-generation Amazon SageMaker	1	926	September 25, 2023
Custom Inference.py script for Vision Transformer Amazon SageMaker	2	1558	December 9, 2022
ClientError:400 when using batch transformer on sagemaker for inference Amazon SageMaker	3	2037	January 11, 2022

Possible to avoid new lines in output?

Related topics