Skip to content

Commit

Permalink
add mlflow model
Browse files Browse the repository at this point in the history
  • Loading branch information
khuyentran1401 committed Aug 2, 2024
1 parent a2cd6bd commit 1712c60
Show file tree
Hide file tree
Showing 13 changed files with 2,155 additions and 3 deletions.
258 changes: 257 additions & 1 deletion Chapter5/machine_learning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2570,6 +2570,262 @@
"source": [
"[Link to AutoGluon](https://bit.ly/45ljoOd)."
]
},
{
"cell_type": "markdown",
"id": "d8376b0e",
"metadata": {},
"source": [
"### Model Logging Made Easy: MLflow vs. Pickle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is why using MLflow to log model is superior to using pickle to save model:\n",
"\n",
"1. Managing Library Versions:\n",
"- Problem: Different models may require different versions of the same library, which can lead to conflicts. Manually tracking and setting up the correct environment for each model is time-consuming and error-prone.\n",
"- Solution: By automatically logging dependencies, MLflow ensures that anyone can recreate the exact environment needed to run the model.\n",
"\n",
"2. Documenting Inputs and Outputs: \n",
"- Problem: Often, the expected inputs and outputs of a model are not well-documented, making it difficult for others to use the model correctly.\n",
"- Solution: By defining a clear schema for inputs and outputs, MLflow ensures that anyone using the model knows exactly what data to provide and what to expect in return.\n",
"\n",
"To demonstrate the advantages of MLflow, let’s implement a simple logistic regression model and log it."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "644b25c0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Saving data to runs:/f8b0fc900aa14cf0ade8d0165c5a9f11/model\n"
]
}
],
"source": [
"import mlflow\n",
"from mlflow.models import infer_signature\n",
"import numpy as np\n",
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"with mlflow.start_run():\n",
" X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)\n",
" y = np.array([0, 0, 1, 1, 1, 0])\n",
" lr = LogisticRegression()\n",
" lr.fit(X, y)\n",
" signature = infer_signature(X, lr.predict(X))\n",
"\n",
" model_info = mlflow.sklearn.log_model(\n",
" sk_model=lr, artifact_path=\"model\", signature=signature\n",
" )\n",
"\n",
" print(f\"Saving data to {model_info.model_uri}\")"
]
},
{
"cell_type": "markdown",
"id": "28c9ff92",
"metadata": {},
"source": [
"The output indicates where the model has been saved. To use the logged model later, you can load it with the `model_uri`:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "f88b0415",
"metadata": {},
"outputs": [],
"source": [
"import mlflow\n",
"import numpy as np\n",
"\n",
"model_uri = \"runs:/1e20d72afccf450faa3b8a9806a97e83/model\"\n",
"sklearn_pyfunc = mlflow.pyfunc.load_model(model_uri=model_uri)\n",
"\n",
"data = np.array([-4, 1, 0, 10, -2, 1]).reshape(-1, 1)\n",
"\n",
"predictions = sklearn_pyfunc.predict(data)"
]
},
{
"cell_type": "markdown",
"id": "58ebe221",
"metadata": {},
"source": [
"Let's inspect the artifacts saved with the model:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "8acda9d6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5/mlruns/0/1e20d72afccf450faa3b8a9806a97e83/artifacts/model\n",
"MLmodel model.pkl requirements.txt\n",
"conda.yaml python_env.yaml\n"
]
}
],
"source": [
"%cd mlruns/0/1e20d72afccf450faa3b8a9806a97e83/artifacts/model\n",
"%ls"
]
},
{
"cell_type": "markdown",
"id": "ced58a8d",
"metadata": {},
"source": [
"The `MLmodel` file provides essential information about the model, including dependencies and input/output specifications:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "dc30e383",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"artifact_path: model\n",
"flavors:\n",
" python_function:\n",
" env:\n",
" conda: conda.yaml\n",
" virtualenv: python_env.yaml\n",
" loader_module: mlflow.sklearn\n",
" model_path: model.pkl\n",
" predict_fn: predict\n",
" python_version: 3.11.6\n",
" sklearn:\n",
" code: null\n",
" pickled_model: model.pkl\n",
" serialization_format: cloudpickle\n",
" sklearn_version: 1.4.1.post1\n",
"mlflow_version: 2.15.0\n",
"model_size_bytes: 722\n",
"model_uuid: e7487bc3c4ab417c965144efcecaca2f\n",
"run_id: 1e20d72afccf450faa3b8a9806a97e83\n",
"signature:\n",
" inputs: '[{\"type\": \"tensor\", \"tensor-spec\": {\"dtype\": \"int64\", \"shape\": [-1, 1]}}]'\n",
" outputs: '[{\"type\": \"tensor\", \"tensor-spec\": {\"dtype\": \"int64\", \"shape\": [-1]}}]'\n",
" params: null\n",
"utc_time_created: '2024-08-02 20:58:16.516963'\n"
]
}
],
"source": [
"%cat MLmodel"
]
},
{
"cell_type": "markdown",
"id": "01ef7c12",
"metadata": {},
"source": [
"The `conda.yaml` and `python_env.yaml` files outline the environment dependencies, ensuring that the model runs in a consistent setup:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "1dce0181",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"channels:\n",
"- conda-forge\n",
"dependencies:\n",
"- python=3.11.6\n",
"- pip<=24.2\n",
"- pip:\n",
" - mlflow==2.15.0\n",
" - cloudpickle==2.2.1\n",
" - numpy==1.23.5\n",
" - psutil==5.9.6\n",
" - scikit-learn==1.4.1.post1\n",
" - scipy==1.11.3\n",
"name: mlflow-env\n"
]
}
],
"source": [
"%cat conda.yaml"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "16c2d3bc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"python: 3.11.6\n",
"build_dependencies:\n",
"- pip==24.2\n",
"- setuptools\n",
"- wheel==0.40.0\n",
"dependencies:\n",
"- -r requirements.txt\n"
]
}
],
"source": [
"%cat python_env.yaml\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "b16b2916",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mlflow==2.15.0\n",
"cloudpickle==2.2.1\n",
"numpy==1.23.5\n",
"psutil==5.9.6\n",
"scikit-learn==1.4.1.post1\n",
"scipy==1.11.3"
]
}
],
"source": [
"%cat requirements.txt"
]
},
{
"cell_type": "markdown",
"id": "00f7bbf0",
"metadata": {},
"source": [
"[Learn more about MLFlow Models](https://bit.ly/46y6gpF)."
]
}
],
"metadata": {
Expand All @@ -2590,7 +2846,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.11.6"
},
"toc": {
"base_numbering": 1,
Expand Down
9 changes: 9 additions & 0 deletions Chapter5/mlflow_sklearn_load_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import mlflow
import numpy as np

model_uri = "runs:/1e20d72afccf450faa3b8a9806a97e83/model"
sklearn_pyfunc = mlflow.pyfunc.load_model(model_uri=model_uri)

data = np.array([-4, 1, 0, 10, -2, 1]).reshape(-1, 1)

predictions = sklearn_pyfunc.predict(data)
17 changes: 17 additions & 0 deletions Chapter5/mlflow_sklearn_log_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import mlflow
from mlflow.models import infer_signature
import numpy as np
from sklearn.linear_model import LogisticRegression

with mlflow.start_run():
X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
y = np.array([0, 0, 1, 1, 1, 0])
lr = LogisticRegression()
lr.fit(X, y)
signature = infer_signature(X, lr.predict(X))

model_info = mlflow.sklearn.log_model(
sk_model=lr, artifact_path="model", signature=signature
)

print(f"Saving data to {model_info.model_uri}")
Loading

0 comments on commit 1712c60

Please sign in to comment.