How to modify output_fn for inference.py?

Abesadi · January 16, 2023, 11:07pm

Output  from Batch transformation job is input.json -> input.json.out
{"metadata_key":"11111111","sentiment_score":[0.8079521656036377,0.17033520340919495,0.021712591871619225],"sentiment":"Negative"}
{"metadata_key":"22222222","sentiment_score":[0.015392672270536423,0.01991521380841732,0.9646921753883362],"sentiment":"Positive"}
{metadata_key":"33333333","sentiment_score":[0.02455803006887436,0.02973146177828312,0.9457104802131653],"sentiment":"Positive"}

I want to have a custom name for output file and I want it to be in parquet: for example:
input.json ->sentiment_01-16-2023.parquet

Can anyone help me to construct output_fn?

crajah · January 26, 2023, 11:38am

Here is an example of an input and output function for Parquet serialising. I hope this helps.

from io import BytesIO
from typing import BinaryIO
import pandas as pd
from botocore.response import StreamingBody

def input_fn(
  serialized_input_data: StreamingBody,
  content_type: str = "application/x-parquet",
) -> pd.DataFrame:
  """Inputs from Parquet to Data Frame"""
  if content_type == "application/x-parquet":
    data = BytesIO(serialized_input_data)
    df = pd.read_parquet(data)
    return df
  else:
    raise ValueError(
      "Expected `application/x-parquet`."
    )

def output_fn(output: pd.DataFrame, accept: str = "application/x-parquet") -> BinaryIO:
  """Output from Data Frame to Parquet"""
  if accept == "application/x-parquet":
    buffer = BytesIO()
    output.to_parquet(buffer)
    return buffer.getvalue()
  else:
    raise Exception("Requested unsupported ContentType in Accept: " + accept)

Abesadi · January 26, 2023, 3:04pm

Thanks for your support @crajah, and I will test your code snippet. Quick question, how can also custom name the output of the file. For example, currently it is like that: input.jsonl → input.jsonl.out, or with your code: input.parquet->input.parquet.out Is there a way to custom name the output file?

crajah · January 26, 2023, 3:51pm

I don’t believe the output file name can be changed. I’d suggest add a call to s3 to change the file name after generation

For every S3 object used as input for the transform job, batch transform stores the transformed data with an .out suffix in a corresponding subfolder in the location in the output prefix. For example, for the input data stored at s3://bucket-name/input-name-prefix/dataset01/data.csv , batch transform stores the transformed data at s3://bucket-name/output-name-prefix/input-name-prefix/data.csv.out

Abesadi · January 26, 2023, 5:39pm

Got it, Thank for your support @crajah

Topic		Replies	Views
Possible to avoid new lines in output? Amazon SageMaker	5	1086	August 10, 2022
Inference Toolkit - Init and default template for custom inference Amazon SageMaker	12	2128	October 4, 2021
How return custom inference in AWS SageMaker without clone the repo? Amazon SageMaker	0	629	June 13, 2023
Custom Inference.py script for Vision Transformer Amazon SageMaker	2	1561	December 9, 2022
Endpoint Deployment Amazon SageMaker	1	1111	September 20, 2021

How to modify output_fn for inference.py?

Related topics