How to modify output_fn for

Output  from Batch transformation job is input.json -> input.json.out

I want to have a custom name for output file and I want it to be in parquet: for example:
input.json ->sentiment_01-16-2023.parquet

Can anyone help me to construct output_fn?

Here is an example of an input and output function for Parquet serialising. I hope this helps.

from io import BytesIO
from typing import BinaryIO
import pandas as pd
from botocore.response import StreamingBody

def input_fn(
  serialized_input_data: StreamingBody,
  content_type: str = "application/x-parquet",
) -> pd.DataFrame:
  """Inputs from Parquet to Data Frame"""
  if content_type == "application/x-parquet":
    data = BytesIO(serialized_input_data)
    df = pd.read_parquet(data)
    return df
    raise ValueError(
      "Expected `application/x-parquet`."

def output_fn(output: pd.DataFrame, accept: str = "application/x-parquet") -> BinaryIO:
  """Output from Data Frame to Parquet"""
  if accept == "application/x-parquet":
    buffer = BytesIO()
    return buffer.getvalue()
    raise Exception("Requested unsupported ContentType in Accept: " + accept)

Thanks for your support @crajah, and I will test your code snippet. Quick question, how can also custom name the output of the file. For example, currently it is like that: input.jsonl → input.jsonl.out, or with your code: input.parquet->input.parquet.out Is there a way to custom name the output file?

I don’t believe the output file name can be changed. I’d suggest add a call to s3 to change the file name after generation

For every S3 object used as input for the transform job, batch transform stores the transformed data with an .out suffix in a corresponding subfolder in the location in the output prefix. For example, for the input data stored at s3://bucket-name/input-name-prefix/dataset01/data.csv , batch transform stores the transformed data at s3://bucket-name/output-name-prefix/input-name-prefix/data.csv.out

Got it, Thank for your support @crajah