{ "cells": [ { "cell_type": "markdown", "id": "7b527b7f-477e-436a-99c1-2cb5264a3e9e", "metadata": {}, "source": [ "# GPU Support and Changing Operator Device\n", "\n", "In this notebook we'll see how to leverage GPU acceleration for our numerical linear algebra routines. We'll start by showing how to place a `LinearOperator` in either a GPU or a CPU and then we'll apply what we just learned to a large-scale linear regression problem." ] }, { "cell_type": "code", "execution_count": 1, "id": "c89db44c-ab01-4cb9-adc5-0f4489abf19d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cpu\n", "218 µs ± 5.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n" ] } ], "source": [ "import cola\n", "import torch\n", "\n", "N, D, B = 1_000, 40, 200\n", "x = torch.randn(N, D)\n", "ones = torch.ones((N, B))\n", "A = cola.ops.Dense(x) @ cola.ops.Dense(x.T)\n", "\n", "print(A.device)\n", "\n", "%timeit A @ ones" ] }, { "cell_type": "markdown", "id": "ea8617ed-6612-4fa3-b6db-07c885bae8f2", "metadata": {}, "source": [ "As we can see from the prints, the operator $A$ is using CPU for its computations. To now have $A$ use a GPU, we use the `to()` method (similar to `PyTorch`) and place $A$ to the `device` of our choice. Under the hood, we find all the parameters that define $A$ and place those arrays into the `device`. " ] }, { "cell_type": "code", "execution_count": 2, "id": "0ceb4425-7051-4106-b8bc-e14c441adf6e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cuda:0\n", "59.1 µs ± 7.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "device_cpu = torch.device(\"cpu\")\n", "device_gpu = torch.device(\"cuda:0\") if torch.cuda.is_available() else device_cpu # defaults to CPU in case there is no GPU\n", "\n", "A = A.to(device_gpu)\n", "ones = ones.to(device_gpu)\n", "\n", "print(A.device)\n", "\n", "%timeit A @ ones" ] }, { "cell_type": "markdown", "id": "d131a133-7f66-44fe-b21c-eae8ca0b283f", "metadata": {}, "source": [ "Looking at the previous prints, we can see that $A$ was successfully placed on a GPU and that the MVM times reduced dramatically!" ] }, { "cell_type": "markdown", "id": "1e952e49-3db9-45e8-bb58-c38f2f0f6a07", "metadata": {}, "source": [ "Lets now see the impact of hardware acceleration on a large-scale linear regression problem. We are going to use the Song UCI dataset. Here is some code to download and to preprocess the data" ] }, { "cell_type": "code", "execution_count": 3, "id": "be6b0b99-570b-40c3-a577-c55a71a72fdb", "metadata": {}, "outputs": [], "source": [ "![ ! -f \"$HOME/data/song.mat\" ] && wget -P ~/data https://www.andpotap.com/static/song.mat" ] }, { "cell_type": "code", "execution_count": 4, "id": "8671e6aa-c98f-4083-9dd1-c486ee0ee29c", "metadata": {}, "outputs": [], "source": [ "import os\n", "from math import floor\n", "import numpy as np\n", "from scipy.io import loadmat\n", "\n", "def load_uci_data(dataset, data_dir=\"./\", train_p=0.75, test_p=0.15):\n", " file_path = os.path.join(data_dir, dataset + '.mat')\n", " data = np.array(loadmat(file_path)['data'])\n", " X = data[:, :-1]\n", " y = data[:, -1]\n", " X = X - X.min(0)[0]\n", " X = 2.0 * (X / X.max(0)[0]) - 1.0\n", " y -= y.mean()\n", " y /= y.std()\n", "\n", " train_n = int(floor(train_p * X.shape[0]))\n", " valid_n = int(floor((1. - train_p - test_p) * X.shape[0]))\n", "\n", " split = split_dataset(X, y, train_n, valid_n)\n", " train_x, train_y, valid_x, valid_y, test_x, test_y = split\n", "\n", " return train_x, train_y, test_x, test_y, valid_x, valid_y\n", "\n", "\n", "def split_dataset(x, y, train_n, valid_n):\n", " train_x = x[:train_n, :]\n", " train_y = y[:train_n]\n", "\n", " valid_x = x[train_n:train_n + valid_n, :]\n", " valid_y = y[train_n:train_n + valid_n]\n", "\n", " test_x = x[train_n + valid_n:, :]\n", " test_y = y[train_n + valid_n:]\n", " return train_x, train_y, valid_x, valid_y, test_x, test_y\n", "\n", "\n", "def get_test_rmse(coeffs, test_x, test_y):\n", " err = (test_y - test_x @ coeffs)**2.\n", " return (err.mean())**0.5" ] }, { "cell_type": "code", "execution_count": 5, "id": "fd565821-f1e5-4305-a9f3-fa05d1e54120", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset N=386,508 | D=90\n" ] } ], "source": [ "out = load_uci_data(dataset=\"song\", data_dir=os.path.join(os.environ['HOME'], \"data\"))\n", "train_x, train_y, *_, test_x, test_y = out\n", "print(f\"Dataset N={train_x.shape[0]:,d} | D={train_x.shape[1]:,d}\")" ] }, { "cell_type": "markdown", "id": "b2fe527b-f088-4cae-8dde-99c2d73b5290", "metadata": {}, "source": [ "This UCI dataset is fairly large, consisting of 386.5K observations and 90 features. We will now run a linear regression on this dataset using CPU." ] }, { "cell_type": "code", "execution_count": 6, "id": "e69ea2eb-7f7b-446c-9c03-76de7d614aa0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(91, 91)\n", "cpu\n", "cpu\n" ] } ], "source": [ "train_x = torch.tensor(train_x)\n", "ones = torch.ones((train_x.shape[0], 1))\n", "train_x = torch.cat((ones, train_x), dim=1)\n", "train_y = torch.tensor(train_y)\n", "\n", "test_x = torch.tensor(test_x)\n", "ones = torch.ones((test_x.shape[0], 1))\n", "test_x = torch.cat((ones, test_x), dim=1)\n", "test_y = torch.tensor(test_y)\n", "\n", "rhs = train_x.T @ train_y\n", "XTX = cola.ops.Dense(train_x.T) @ cola.ops.Dense(train_x)\n", "mu = 1e-2\n", "XTX += mu * cola.ops.I_like(XTX)\n", "XTX = cola.PSD(XTX)\n", "\n", "print(XTX.shape)\n", "print(XTX.device)\n", "print(rhs.device)" ] }, { "cell_type": "markdown", "id": "3d4e29e9-293d-448e-b8f6-ad21328a82cb", "metadata": {}, "source": [ "We lazily defined $X^T X$ and added some regularization via $\\mu$. Moreover, everything is placed on the CPU. We can therefore get the coefficients of the regression and evaluate the model by running" ] }, { "cell_type": "code", "execution_count": 7, "id": "1d0c9198-0df1-4ade-bc10-6c2271c228f7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test RMSE: 8.628e-01\n" ] } ], "source": [ "coeffs = cola.solve(XTX, rhs, method=\"iterative\")\n", "test_rmse = get_test_rmse(coeffs, test_x, test_y)\n", "print(f\"Test RMSE: {test_rmse:1.3e}\")" ] }, { "cell_type": "markdown", "id": "437eae88-9600-418a-88db-94e4cbcd70bc", "metadata": {}, "source": [ "We selected `method=iterative` which uses CG as solver and therefore ensures that we are running the same algorithm but on different hardware." ] }, { "cell_type": "code", "execution_count": 8, "id": "6b79538f-1fca-4d81-b2b7-cac857cb1091", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.6 s ± 121 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%timeit cola.solve(XTX, rhs, method=\"iterative\")" ] }, { "cell_type": "markdown", "id": "3e9d1b71-b482-45b8-b4e5-c35d7614bf45", "metadata": {}, "source": [ "To move to a GPU, we again use the `to()` method and pass both the operator $X^T X$ to the GPU as well as the RHS." ] }, { "cell_type": "code", "execution_count": 9, "id": "aa4c895e-3074-4168-bcaa-8d47d85fe981", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test RMSE: 8.628e-01\n" ] } ], "source": [ "XTX = XTX.to(device_gpu)\n", "rhs = rhs.to(device_gpu)\n", "\n", "coeffs = cola.solve(XTX, rhs, method=\"iterative\")\n", "test_rmse = get_test_rmse(coeffs, test_x.to(device_gpu), test_y.to(device_gpu))\n", "print(f\"Test RMSE: {test_rmse:1.3e}\")" ] }, { "cell_type": "code", "execution_count": 10, "id": "22aa1ebd-cc58-4f79-8e4a-ce8a00252f81", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "645 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%timeit cola.solve(XTX, rhs, method=\"iterative\")" ] }, { "cell_type": "markdown", "id": "21469dd6-3962-429a-a796-f9f6ea2ba044", "metadata": {}, "source": [ "We thus see the significant gains that we get from running on a GPU.\n", "\n", "In summary, we saw that using GPU acceleration is as simple as using the `to()` method on any `LinearOperator` that we want!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }