diff --git a/notebooks/overfit_wrap_up_ex_00.ipynb b/notebooks/overfit_wrap_up_ex_00.ipynb
new file mode 100644
index 000000000..9ba2347e5
--- /dev/null
+++ b/notebooks/overfit_wrap_up_ex_00.ipynb
@@ -0,0 +1,308 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# \ud83c\udfc1 Wrap-up quiz 2\n",
+    "\n",
+    "This notebook contains the guided project to answer the hands-on questions\n",
+    "corresponding to the module \"Selecting the best model\" of the Associate\n",
+    "Practitioner Course. In this test **we do not have access to your code**. Only\n",
+    "it's output is evaluated by using the multiple choice questions, to be\n",
+    "answered in the dedicated User Interface.\n",
+    "\n",
+    "First run the following cell to initialize jupyterlite. Notice that only basic\n",
+    "libraries are available, such as pandas, matplotlib, seaborn and numpy.\n",
+    "Remember that the initial import of libraries can take longer than usual, it\n",
+    "may take around 10-20 seconds for the following cell to run. Please be\n",
+    "patient."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install seaborn==0.13.2\n",
+    "import matplotlib\n",
+    "import numpy\n",
+    "import pandas\n",
+    "import seaborn\n",
+    "import sklearn"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load the `blood_transfusion.csv` dataset with the following cell of code. The\n",
+    "column \"Class\" contains the target variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "blood_transfusion = pd.read_csv(\"../datasets/blood_transfusion.csv\")\n",
+    "target_name = \"Class\"\n",
+    "data = blood_transfusion.drop(columns=target_name)\n",
+    "target = blood_transfusion[target_name]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Select the correct answers from the following proposals.\n",
+    "\n",
+    "- a) The problem to be solved is a regression problem\n",
+    "- b) The problem to be solved is a binary classification problem (exactly 2\n",
+    "  possible classes)\n",
+    "- c) The problem to be solved is a multiclass classification problem (more\n",
+    "  than 2 possible classes)\n",
+    "- d) The proportions of the class counts are imbalanced: some classes have\n",
+    "  more than twice as many rows than others\n",
+    "\n",
+    "_Select all answers that apply_\n",
+    "\n",
+    "Hint: `target.unique()` and `target.value_counts()` are helpful methods to\n",
+    "answer this question."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using a\n",
+    "[`sklearn.dummy.DummyClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html)\n",
+    "and the strategy `\"most_frequent\"`, what is the average of the accuracy scores\n",
+    "obtained by performing a 10-fold cross-validation?\n",
+    "\n",
+    "- a) ~25%\n",
+    "- b) ~50%\n",
+    "- c) ~75%\n",
+    "\n",
+    "_Select a single answer_\n",
+    "\n",
+    "Hint: You can check the documentation of\n",
+    "[`sklearn.model_selection.cross_val_score`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html)\n",
+    "and\n",
+    "[`sklearn.model_selection.cross_validate`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Repeat the previous experiment but compute the balanced accuracy instead of\n",
+    "the accuracy score. Pass `scoring=\"balanced_accuracy\"` when calling\n",
+    "`cross_validate` or `cross_val_score` functions, the mean score is:\n",
+    "\n",
+    "- a) ~25%\n",
+    "- b) ~50%\n",
+    "- c) ~75%\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will use a `sklearn.neighbors.KNeighborsClassifier` for the remainder of this quiz.\n",
+    "\n",
+    "Why is it relevant to add a preprocessing step to scale the data using a\n",
+    "`StandardScaler` when working with a `KNeighborsClassifier`?\n",
+    "\n",
+    "- a) faster to compute the list of neighbors on scaled data\n",
+    "- b) k-nearest neighbors is based on computing some distances. Features need\n",
+    "  to be normalized to contribute approximately equally to the distance\n",
+    "  computation.\n",
+    "- c) This is irrelevant. One could use k-nearest neighbors without normalizing\n",
+    "  the dataset and get a very similar cross-validation score.\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Create a scikit-learn pipeline (using\n",
+    "`sklearn.pipeline.make_pipeline`) where a StandardScaler will be used to scale\n",
+    "the data followed by a KNeighborsClassifier. Use the default hyperparameters.\n",
+    "\n",
+    "Inspect the parameters of the created pipeline. What is the value of K, the\n",
+    "number of neighbors considered when predicting with the k-nearest neighbors.\n",
+    "\n",
+    "- a) 1\n",
+    "- b) 3\n",
+    "- c) 5\n",
+    "- d) 8\n",
+    "- e) 10\n",
+    "\n",
+    "_Select a single answer_\n",
+    "\n",
+    "Hint: You can use `model.get_params()` to get the parameters of a scikit-learn\n",
+    "estimator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Set `n_neighbors=1` in the previous model and evaluate it using a 10-fold\n",
+    "cross-validation. Use the balanced accuracy as a score. What can you say about\n",
+    "this model? Compare the average of the train and test scores to argument your\n",
+    "answer.\n",
+    "\n",
+    "- a) The model underfits\n",
+    "- b) The model generalizes\n",
+    "- c) The model overfits\n",
+    "\n",
+    "_Select a single answer_\n",
+    "\n",
+    "Hint: compute the average test score and the average train score and compare\n",
+    "them. Make sure to pass `return_train_score=True` to the `cross_validate`\n",
+    "function to also compute the train score."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We now study the effect of the parameter n_neighbors on the train and test\n",
+    "score using a validation curve. You can use the following parameter range:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "param_range = np.array([1, 2, 5, 10, 20, 50, 100, 200, 500])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Also, use a 5-fold cross-validation and compute the balanced accuracy score\n",
+    "instead of the default accuracy score (check the scoring parameter). Finally,\n",
+    "plot the average train and test scores for the different value of the\n",
+    "hyperparameter. Remember that the name of the parameter can be found using\n",
+    "`model.get_params()`.\n",
+    "\n",
+    "Select the true affirmations stated below:\n",
+    "\n",
+    "- a) The model underfits for a range of `n_neighbors` values between 1 to 10\n",
+    "- b) The model underfits for a range of `n_neighbors` values between 10 to 100\n",
+    "- c) The model underfits for a range of `n_neighbors` values between 100 to 500\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Select the most correct of the affirmations stated below:\n",
+    "\n",
+    "- a) The model overfits for a range of `n_neighbors` values between 1 to 10\n",
+    "- b) The model overfits for a range of `n_neighbors` values between 10 to 100\n",
+    "- c) The model overfits for a range of `n_neighbors` values between 100 to 500\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Select the most correct of the affirmations stated below:\n",
+    "#\n",
+    "# - a) The model best generalizes for a range of `n_neighbors` values between 1 to 10\n",
+    "# - b) The model best generalizes for a range of `n_neighbors` values between 10 to 100\n",
+    "# - c) The model best generalizes for a range of `n_neighbors` values between 100 to 500\n",
+    "#\n",
+    "# _Select a single answer_"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "main_language": "python"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/notebooks/pipeline_wrap_up_ex_00.ipynb b/notebooks/pipeline_wrap_up_ex_00.ipynb
new file mode 100644
index 000000000..ae2b129e2
--- /dev/null
+++ b/notebooks/pipeline_wrap_up_ex_00.ipynb
@@ -0,0 +1,282 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# \ud83c\udfc1 Wrap-up quiz 1\n",
+    "\n",
+    "This notebook contains the guided project to answer the hands-on questions\n",
+    "corresponding to the module \"The predictive modeling pipeline\" of the\n",
+    "Associate Practitioner Course. In this test **we do not have access to your\n",
+    "code**. Only it's output is evaluated by using the multiple choice questions,\n",
+    "to be answered in the dedicated User Interface.\n",
+    "\n",
+    "First run the following cell to initialize jupyterlite. Notice that only basic\n",
+    "libraries are available, such as pandas, matplotlib, seaborn and numpy.\n",
+    "Remember that the initial import of libraries can take longer than usual, it\n",
+    "may take around 10-20 seconds for the following cell to run. Please be\n",
+    "patient."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install seaborn==0.13.2\n",
+    "import matplotlib\n",
+    "import numpy\n",
+    "import pandas\n",
+    "import seaborn\n",
+    "import sklearn"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load the `ames_housing_no_missing.csv` dataset with the following cell of code.\n",
+    "\n",
+    "The target is the \"SalePrice\" column. As we have not encountered any\n",
+    "regression problem yet, we convert the regression target into a classification\n",
+    "target, where the goal is to predict whether or not the sale price of a house\n",
+    "is greater than $200,000."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "ames_housing = pd.read_csv(\"../datasets/ames_housing_no_missing.csv\")\n",
+    "\n",
+    "target_name = \"SalePrice\"\n",
+    "data, target = ames_housing.drop(columns=target_name), ames_housing[target_name]\n",
+    "target = (target > 200_000).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the `data.info()` and ` data.head()` commands to examine the columns of\n",
+    "the dataframe. The dataset contains:\n",
+    "\n",
+    "- a) only numerical features\n",
+    "- b) only categorical features\n",
+    "- c) both numerical and categorical features\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "How many features are available to predict whether or not a house is\n",
+    "expensive?\n",
+    "\n",
+    "- a) 79\n",
+    "- b) 80\n",
+    "- c) 81\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "How many features are represented with numbers?\n",
+    "\n",
+    "- a) 0\n",
+    "- b) 36\n",
+    "- c) 42\n",
+    "- d) 79\n",
+    "\n",
+    "_Select a single answer_\n",
+    "\n",
+    "Hint: you can use the method\n",
+    "[`df.select_dtypes`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html)\n",
+    "or the function\n",
+    "[`sklearn.compose.make_column_selector`](https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html)\n",
+    "as shown in a previous notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Refer to the [dataset description](https://www.openml.org/d/42165) regarding\n",
+    "the meaning of the features.\n",
+    "\n",
+    "Among the following features, which of them express a quantitative numerical\n",
+    "value (excluding ordinal categories)?\n",
+    "\n",
+    "- a) \"LotFrontage\"\n",
+    "- b) \"LotArea\"\n",
+    "- c) \"OverallQual\"\n",
+    "- d) \"OverallCond\"\n",
+    "- e) \"YearBuilt\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We consider the following numerical columns:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "numerical_features = [\n",
+    "  \"LotFrontage\", \"LotArea\", \"MasVnrArea\", \"BsmtFinSF1\", \"BsmtFinSF2\",\n",
+    "  \"BsmtUnfSF\", \"TotalBsmtSF\", \"1stFlrSF\", \"2ndFlrSF\", \"LowQualFinSF\",\n",
+    "  \"GrLivArea\", \"BedroomAbvGr\", \"KitchenAbvGr\", \"TotRmsAbvGrd\", \"Fireplaces\",\n",
+    "  \"GarageCars\", \"GarageArea\", \"WoodDeckSF\", \"OpenPorchSF\", \"EnclosedPorch\",\n",
+    "  \"3SsnPorch\", \"ScreenPorch\", \"PoolArea\", \"MiscVal\",\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now create a predictive model that uses these numerical columns as input data.\n",
+    "Your predictive model should be a pipeline composed of a\n",
+    "[`sklearn.preprocessing.StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)\n",
+    "to scale these numerical data and a\n",
+    "[`sklearn.linear_model.LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).\n",
+    "\n",
+    "What is the accuracy score obtained by 10-fold cross-validation (you can set\n",
+    "the parameter `cv=10` when calling `cross_validate`) of this pipeline?\n",
+    "\n",
+    "- a) ~0.5\n",
+    "- b) ~0.7\n",
+    "- c) ~0.9\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Instead of solely using the numerical columns, let us build a pipeline that\n",
+    "can process both the numerical and categorical features together as follows:\n",
+    "- the `numerical_features` (as defined above) should be processed as previously\n",
+    "  done with a `StandardScaler`;\n",
+    "- the left-out columns should be treated as categorical variables using a\n",
+    "  [`sklearn.preprocessing.OneHotEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html).\n",
+    "\n",
+    "To avoid any issue with rare categories that could only be present during the\n",
+    "prediction, you can pass the parameter `handle_unknown=\"ignore\"` to the\n",
+    "`OneHotEncoder`.\n",
+    "\n",
+    "What is the accuracy score obtained by 10-fold cross-validation of the\n",
+    "pipeline using both the numerical and categorical features?\n",
+    "\n",
+    "- a) ~0.7\n",
+    "- b) ~0.9\n",
+    "- c) ~1.0\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "One way to compare two models is by comparing their means, but small\n",
+    "differences in performance measures might easily turn out to be merely by\n",
+    "chance (e.g. when using random resampling during cross-validation), and not\n",
+    "because one model predicts systematically better than the other.\n",
+    "\n",
+    "Another way is to compare cross-validation test scores of both models\n",
+    "fold-to-fold, i.e. counting the number of folds where one model has a better\n",
+    "test score than the other. This provides some extra information: are some\n",
+    "partitions of the data making the classification task particularly easy or\n",
+    "hard for both models?\n",
+    "\n",
+    "Let's visualize the second approach:\n",
+    "\n",
+    "![Fold-to-fold comparison](../figures/numerical_pipeline_wrap_up_quiz_comparison.png)\n",
+    "\n",
+    "Select the true statement.\n",
+    "\n",
+    "The number of folds where the model using all features perform better than the\n",
+    "model using only numerical features lies in the range:\n",
+    "\n",
+    "- a) [0, 3]: the model using all features is consistently worse\n",
+    "- b) [4, 6]: both models are almost equivalent\n",
+    "- c) [7, 10]: the model using all features is consistently better\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "main_language": "python"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/notebooks/tuning_wrap_up_ex_00.ipynb b/notebooks/tuning_wrap_up_ex_00.ipynb
new file mode 100644
index 000000000..426cff3b4
--- /dev/null
+++ b/notebooks/tuning_wrap_up_ex_00.ipynb
@@ -0,0 +1,375 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# \ud83c\udfc1 Wrap-up quiz 3\n",
+    "\n",
+    "This notebook contains the guided project to answer the hands-on questions\n",
+    "corresponding to the module \"Hyperparameter tuning\" of the Associate\n",
+    "Practitioner Course. In this test **we do not have access to your code**. Only\n",
+    "it's output is evaluated by using the multiple choice questions, to be\n",
+    "answered in the dedicated User Interface.\n",
+    "\n",
+    "First run the following cell to initialize jupyterlite. Notice that only basic\n",
+    "libraries are available, such as pandas, matplotlib, seaborn and numpy.\n",
+    "Remember that the initial import of libraries can take longer than usual, it\n",
+    "may take around 10-20 seconds for the following cell to run. Please be\n",
+    "patient."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install seaborn==0.13.2\n",
+    "import matplotlib\n",
+    "import numpy\n",
+    "import pandas\n",
+    "import seaborn\n",
+    "import sklearn"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load the `penguins.csv` dataset with the following cell of code. The column\n",
+    "\"Species\" contains the target variable. We extract the numerical columns that\n",
+    "quantify some attributes of such animals and our goal is to predict their\n",
+    "species based on those attributes stored in the dataframe named data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "penguins = pd.read_csv(\"../datasets/penguins.csv\")\n",
+    "\n",
+    "columns = [\"Body Mass (g)\", \"Flipper Length (mm)\", \"Culmen Length (mm)\"]\n",
+    "target_name = \"Species\"\n",
+    "\n",
+    "# Remove lines with missing values for the columns of interest\n",
+    "penguins_non_missing = penguins[columns + [target_name]].dropna()\n",
+    "\n",
+    "data = penguins_non_missing[columns]\n",
+    "target = penguins_non_missing[target_name]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Inspect the loaded data to select the correct assertions:\n",
+    "\n",
+    "Inspect the target variable and select the correct assertions from the\n",
+    "following proposals.\n",
+    "\n",
+    "- a) The problem to be solved is a regression problem\n",
+    "- b) The problem to be solved is a binary classification problem\n",
+    "  (exactly 2 possible classes)\n",
+    "- c) The problem to be solved is a multiclass classification problem\n",
+    "  (more than 2 possible classes)\n",
+    "\n",
+    "_Select a single answer_\n",
+    "\n",
+    "Hint: `target.nunique()`is a helpful method to answer to this question."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Inspect the statistics of the target and individual features to select the\n",
+    "correct statements.\n",
+    "\n",
+    "- a) The proportions of the class counts are balanced: there are approximately\n",
+    "  the same number of rows for each class\n",
+    "- b) The proportions of the class counts are imbalanced: some classes have\n",
+    "  more than twice as many rows than others\n",
+    "- c) The input features have similar scales (ranges of values)\n",
+    "\n",
+    "_Select all answers that apply_\n",
+    "\n",
+    "Hint: `data.describe()`, and `target.value_counts()` are methods that are\n",
+    "helpful to answer to this question."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's now consider the following pipeline:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.neighbors import KNeighborsClassifier\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "\n",
+    "\n",
+    "model = Pipeline(steps=[\n",
+    "    (\"preprocessor\", StandardScaler()),\n",
+    "    (\"classifier\", KNeighborsClassifier(n_neighbors=5)),\n",
+    "])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Evaluate the pipeline using stratified 10-fold cross-validation with the\n",
+    "`balanced-accuracy` scoring metric to choose the correct statement in the list\n",
+    "below.\n",
+    "\n",
+    "You can use:\n",
+    "\n",
+    "- [`sklearn.model_selection.cross_validate`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html)\n",
+    "  to perform the cross-validation routine;\n",
+    "- provide an integer `10` to the parameter `cv` of `cross_validate` to use the\n",
+    "  cross-validation with 10 folds;\n",
+    "- provide the string `\"balanced_accuracy\"` to the parameter `scoring` of\n",
+    "  `cross_validate`.\n",
+    "\n",
+    "- a) The average cross-validated test balanced accuracy of the above pipeline\n",
+    "  is between 0.9 and 1.0\n",
+    "- b) The average cross-validated test balanced accuracy of the above pipeline\n",
+    "  is between 0.8 and 0.9\n",
+    "- c) The average cross-validated test balanced accuracy of the above pipeline\n",
+    "  is between 0.5 and 0.8\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Repeat the evaluation by setting the parameters in order to select the correct\n",
+    "statements in the list below. We recall that you can use `model.get_params()`\n",
+    "to list the parameters of the pipeline and use\n",
+    "`model.set_params(param_name=param_value)` to update them.\n",
+    "\n",
+    "Remember that one way to compare two models is comparing the cross-validation\n",
+    "test scores of both models fold-to-fold, i.e. counting the number of folds\n",
+    "where one model has a better test score than the other.\n",
+    "\n",
+    "Looking at the individual cross-validation scores:\n",
+    "\n",
+    "- a) Using a model with `n_neighbors=5` is substantially better (at least 7 of\n",
+    "  the cross-validations scores are strictly better) than a model with\n",
+    "  `n_neighbors=51`\n",
+    "- b) Using a model with `n_neighbors=5` is substantially better (at least 7 of\n",
+    "  the cross-validations scores are strictly better) than a model with\n",
+    "  `n_neighbors=101`\n",
+    "- c) A 5 nearest neighbors using a `StandardScaler` is substantially better\n",
+    "  (at least 7 of the cross-validations scores are strictly better) than a 5\n",
+    "  nearest neighbors using the raw features (without scaling).\n",
+    "\n",
+    "_Select all answers that apply_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will now study the impact of different preprocessors defined in the list below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "from sklearn.preprocessing import QuantileTransformer\n",
+    "from sklearn.preprocessing import PowerTransformer\n",
+    "\n",
+    "\n",
+    "all_preprocessors = [\n",
+    "    None,\n",
+    "    StandardScaler(),\n",
+    "    MinMaxScaler(),\n",
+    "    QuantileTransformer(n_quantiles=100),\n",
+    "    PowerTransformer(method=\"box-cox\"),\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div class=\"admonition note alert alert-info\">\n",
+    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
+    "<p class=\"last\">The Box-Cox method is a common preprocessing strategy for positive values.\n",
+    "The other preprocessors work for any kind of numerical features. If you are\n",
+    "curious to read more about those methods, feel free to visit the\n",
+    "<a class=\"reference external\" href=\"https://scikit-learn.org/stable/modules/preprocessing.html\">preprocessing section of the user\n",
+    "guide</a>, although\n",
+    "that is not necessary to answer the following questions.</p>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use `sklearn.model_selection.GridSearchCV` to study the impact of the choice\n",
+    "of the preprocessor and the number of neighbors on the stratified 10-fold\n",
+    "cross-validated `balanced_accuracy` metric. We want to study the `n_neighbors`\n",
+    "in the range `[5, 51, 101]` and `preprocessor` in the range\n",
+    "`all_preprocessors`. Although we wouldn't do this in a real setting (and\n",
+    "prefer using nested cross validation), for this question, do the\n",
+    "cross-validation on the entire dataset.\n",
+    "\n",
+    "Which of the following statements hold:\n",
+    "\n",
+    "- a) Looking at the individual cross-validation scores, the best ranked model\n",
+    "  using a `StandardScaler` is substantially better (at least 7 of the\n",
+    "  cross-validations scores are strictly better) than using any other\n",
+    "  preprocessor\n",
+    "- b) Using any of the preprocessors has always a better ranking than using no\n",
+    "  preprocessor, irrespective of the value `of n_neighbors`\n",
+    "- c) Looking at the individual cross-validation scores, the model with\n",
+    "  `n_neighbors=5` and `StandardScaler` is substantially better (at least 7 of\n",
+    "  the cross-validations scores are strictly better) than the model with\n",
+    "  `n_neighbors=51` and `StandardScaler`\n",
+    "- d) Looking at the individual cross-validation scores, the model with\n",
+    "  `n_neighbors=51` and `StandardScaler` is substantially better (at least 7 of\n",
+    "  the cross-validations scores are strictly better) than the model with\n",
+    "  `n_neighbors=101` and `StandardScaler`\n",
+    "\n",
+    "_Select all answers that apply_\n",
+    "\n",
+    "Hint: pass `{\"preprocessor\": all_preprocessors, \"classifier__n_neighbors\": [5,\n",
+    "51, 101]}` for the `param_grid` argument to the `GridSearchCV` class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Evaluate the generalization performance of the best models found in each fold\n",
+    "using nested cross-validation. Set `return_estimator=True` and `cv=10` for the\n",
+    "outer loop. The scoring metric must be the `balanced-accuracy`. The mean\n",
+    "generalization performance is\n",
+    "\n",
+    "- a) better than 0.97\n",
+    "- b) between 0.92 and 0.97\n",
+    "- c) below 0.92\n",
+    "\n",
+    "_Select a single answer_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "Explore the set of best parameters that the different grid search models found\n",
+    "in each fold of the outer cross-validation. Remember that you can access them\n",
+    "with the `best_params_` attribute of the estimator. Select all the statements\n",
+    "that are true.\n",
+    "\n",
+    "- a) The tuned number of nearest neighbors is stable across folds\n",
+    "- b) The tuned number of nearest neighbors changes often across folds\n",
+    "- c) The optimal scaler is stable across folds\n",
+    "- d) The optimal scaler changes often across folds\n",
+    "\n",
+    "_Select all answers that apply_\n",
+    "\n",
+    "Hint: it is important to pass `return_estimator=True` to the `cross_validate`\n",
+    "function to be able to introspect trained model saved in the `\"estimator\"`\n",
+    "field of the CV results. If you forgot to do for the previous question, please\n",
+    "re-run the cross-validation with that option enabled."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write your code here."
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "main_language": "python"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file