### Issues Policy acknowledgement
- [X] I have read and agree to submit bug r…eports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md)
### Willingness to contribute
No. I cannot contribute a bug fix at this time.
### MLflow version
1.30.0
### System information
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: WSL Ubuntu 20.04
- **Python version**: 3.10.6
- **yarn version, if running the dev UI**: N/A
### Describe the problem
Autologging for TensorFlow (tf.keras) works when I run just `python train.py` but not when I run it from `mlflow run` on the MLproject (which uses the same `train.py` script).
It appears that the autologger logs the state of the model during creation and this prevents it from updating the log values after training.
Here's the error I get:
`2022/10/24 19:46:15 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='402712a4625a43bca38c0bce38fa4ed1'.`
As you can see, the autolog apparently logged `None` for all of these values.
Again, the same script works well when I run it outside of MLproject.
### Tracking information
_No response_
### Code to reproduce issue
```python
"""
TF/Keras Training script for MLFlow
"""
# mlflow run -e train_entry --env-manager=local --experiment-name=tony-reina-experiments .
from datetime import datetime
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import click # pip install click
import tensorflow as tf # pip install tensorflow
# The following import and function call,
# are the only additions to code required
# to automatically log
# metrics and parameters to MLflow.
import mlflow # pip install mlflow
EXPERIMENT_NAME = "tony-reina-experiments"
def load_data():
"""Load dataset and pre-process
Fashion MNIST https://github.com/zalandoresearch/fashion-mnist
28x28 grayscale images of clothes from 10 different categories
"""
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# Normalize the images from 0.0 to 1.0
train_images = train_images / 255.0
test_images = test_images / 255.0
# Human-readable class names
class_names = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
return train_images, train_labels, test_images, test_labels, class_names
def create_model(parameters):
"""Create a simple TensorFlow Keras model
Args:
parameters(dict): Number of units,
optimizer, and metrics for model
"""
model = tf.keras.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(parameters["num_units"], activation="relu"),
tf.keras.layers.Dense(10),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=parameters["learning_rate"]),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[parameters["metrics"]],
)
return model
@click.command(help="The base training script for MLFlow.")
@click.option(
"--num-units", default=128, type=int, help="Number of units in the dense layer"
)
@click.option("--epochs", default=3, type=int, help="Number of training epochs")
@click.option("--batch-size", default=32, type=int, help="Batch size")
@click.option("--learning-rate", default=1e-4, type=float, help="Learning rate")
@click.option("--metrics", default="accuracy", type=str, help="Model metric to track")
@click.option(
"--training-data", default=".", type=str, help="Path to the training data"
)
@click.option("--testing-data", default=".", type=str, help="Path to the testing data")
def train(
num_units,
epochs,
batch_size,
learning_rate,
metrics,
training_data,
testing_data,
):
"""Run training"""
train_images, train_labels, test_images, test_labels, class_names = load_data()
# Instead of passing lots of variables,
# we'll just pass a dictionary
parameters = {
"num_units": num_units,
"num_epochs": epochs,
"batch_size": batch_size,
"learning_rate": learning_rate,
"metrics": metrics,
"training_data": training_data,
"testing_data": testing_data,
}
click.secho(parameters)
click.secho("Setting up MLflow tracking uri...")
mlflow.tracking.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI"))
mlflow.set_experiment(experiment_name=EXPERIMENT_NAME)
mlflow.tensorflow.autolog(
log_models=True,
silent=False,
registered_model_name="ye_olde_mnist_fashion",
)
current_time = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
click.secho("Starting the MLFlow Run...")
model = create_model(parameters)
with mlflow.start_run(
#run_name=f"YeOldDemo-{current_time}",
tags={"ImageTag": "local"},
description="Ye Olde Model Xample",
):
model.fit(
train_images,
train_labels,
epochs=parameters["num_epochs"],
batch_size=parameters["batch_size"],
)
click.secho("Finished training")
test_loss, test_acc = model.evaluate(test_images, test_labels)
mlflow.log_param(key="test_loss", value=test_loss)
mlflow.log_param(key="test_acc", value=test_acc)
mlflow.log_param(key="Class names", value=class_names)
mlflow.log_param(key="TensorFlow version", value=tf.__version__)
if __name__ == "__main__":
train()
```
### Stack trace
```
2022/10/24 19:49:34 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='a6fc78cd0973486aa6b0ddb5f36581ae'.
```
### Other info / logs
```
2022/10/24 19:49:34 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='a6fc78cd0973486aa6b0ddb5f36581ae'.
```
### What component(s) does this bug affect?
- [ ] `area/artifacts`: Artifact stores and artifact logging
- [ ] `area/build`: Build and test infrastructure for MLflow
- [ ] `area/docs`: MLflow documentation pages
- [ ] `area/examples`: Example code
- [ ] `area/model-registry`: Model Registry service, APIs, and the fluent client calls for Model Registry
- [ ] `area/models`: MLmodel format, model serialization/deserialization, flavors
- [ ] `area/pipelines`: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
- [X] `area/projects`: MLproject format, project running backends
- [ ] `area/scoring`: MLflow Model server, model deployment tools, Spark UDFs
- [ ] `area/server-infra`: MLflow Tracking server backend
- [X] `area/tracking`: Tracking Service, tracking client APIs, autologging
### What interface(s) does this bug affect?
- [ ] `area/uiux`: Front-end, user experience, plotting, JavaScript, JavaScript dev server
- [ ] `area/docker`: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
- [ ] `area/sqlalchemy`: Use of SQLAlchemy in the Tracking Service or Model Registry
- [ ] `area/windows`: Windows support
### What language(s) does this bug affect?
- [ ] `language/r`: R APIs and clients
- [ ] `language/java`: Java APIs and clients
- [ ] `language/new`: Proposals for new client languages
### What integration(s) does this bug affect?
- [ ] `integrations/azure`: Azure and Azure ML integrations
- [ ] `integrations/sagemaker`: SageMaker integrations
- [ ] `integrations/databricks`: Databricks integrations