{ "cells": [ { "cell_type": "markdown", "id": "f8ffbea7", "metadata": {}, "source": [ "# Working with missing data" ] }, { "cell_type": "markdown", "id": "7e3ab093", "metadata": {}, "source": [ "In this section, we will discuss missing (also referred to as `NA`) values in cudf. cudf supports having missing values in all dtypes. These missing values are represented by ``. These values are also referenced as \"null values\"." ] }, { "cell_type": "markdown", "id": "8d657a82", "metadata": {}, "source": [ "## How to Detect missing values" ] }, { "cell_type": "markdown", "id": "9ea9f672", "metadata": {}, "source": [ "To detect missing values, you can use `isna()` and `notna()` functions." ] }, { "cell_type": "code", "execution_count": 1, "id": "58050adb", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import cudf" ] }, { "cell_type": "code", "execution_count": 2, "id": "416d73da", "metadata": {}, "outputs": [], "source": [ "df = cudf.DataFrame({\"a\": [1, 2, None, 4], \"b\": [0.1, None, 2.3, 17.17]})" ] }, { "cell_type": "code", "execution_count": 3, "id": "5dfc6bc3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
010.1
12<NA>
2<NA>2.3
3417.17
\n", "
" ], "text/plain": [ " a b\n", "0 1 0.1\n", "1 2 \n", "2 2.3\n", "3 4 17.17" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 4, "id": "4d7f7a6d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
0FalseFalse
1FalseTrue
2TrueFalse
3FalseFalse
\n", "
" ], "text/plain": [ " a b\n", "0 False False\n", "1 False True\n", "2 True False\n", "3 False False" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna()" ] }, { "cell_type": "code", "execution_count": 5, "id": "40edca67", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 True\n", "2 False\n", "3 True\n", "Name: a, dtype: bool" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"a\"].notna()" ] }, { "cell_type": "markdown", "id": "acdf29d7", "metadata": {}, "source": [ "One has to be mindful that in Python (and NumPy), the nan's don't compare equal, but None's do. Note that cudf/NumPy uses the fact that `np.nan != np.nan`, and treats `None` like `np.nan`." ] }, { "cell_type": "code", "execution_count": 6, "id": "c269c1f5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "None == None" ] }, { "cell_type": "code", "execution_count": 7, "id": "99fb083a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan == np.nan" ] }, { "cell_type": "markdown", "id": "4fdb8bc7", "metadata": {}, "source": [ "So as compared to above, a scalar equality comparison versus a None/np.nan doesn't provide useful information." ] }, { "cell_type": "code", "execution_count": 8, "id": "630ef6bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 \n", "2 False\n", "3 False\n", "Name: b, dtype: bool" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"b\"] == np.nan" ] }, { "cell_type": "code", "execution_count": 9, "id": "8162e383", "metadata": {}, "outputs": [], "source": [ "s = cudf.Series([None, 1, 2])" ] }, { "cell_type": "code", "execution_count": 10, "id": "199775b3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 \n", "1 1\n", "2 2\n", "dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 11, "id": "cd09d80c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 \n", "1 \n", "2 \n", "dtype: bool" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s == None" ] }, { "cell_type": "code", "execution_count": 12, "id": "6b23bb0c", "metadata": {}, "outputs": [], "source": [ "s = cudf.Series([1, 2, np.nan], nan_as_null=False)" ] }, { "cell_type": "code", "execution_count": 13, "id": "cafb79ee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 NaN\n", "dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 14, "id": "13363897", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 False\n", "dtype: bool" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s == np.nan" ] }, { "cell_type": "markdown", "id": "208a3776", "metadata": {}, "source": [ "## Float dtypes and missing data" ] }, { "cell_type": "markdown", "id": "2c174b88", "metadata": {}, "source": [ "Because ``NaN`` is a float, a column of integers with even one missing values is cast to floating-point dtype. However this doesn't happen by default.\n", "\n", "By default if a ``NaN`` value is passed to `Series` constructor, it is treated as `` value." ] }, { "cell_type": "code", "execution_count": 15, "id": "c59c3c54", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 \n", "dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([1, 2, np.nan])" ] }, { "cell_type": "markdown", "id": "a9eb2d9c", "metadata": {}, "source": [ "Hence to consider a ``NaN`` as ``NaN`` you will have to pass `nan_as_null=False` parameter into `Series` constructor." ] }, { "cell_type": "code", "execution_count": 16, "id": "ecc5ae92", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 NaN\n", "dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([1, 2, np.nan], nan_as_null=False)" ] }, { "cell_type": "markdown", "id": "d1db7b08", "metadata": {}, "source": [ "## Datetimes" ] }, { "cell_type": "markdown", "id": "548d3734", "metadata": {}, "source": [ "For `datetime64` types, cudf doesn't support having `NaT` values. Instead these values which are specific to numpy and pandas are considered as null values(``) in cudf. The actual underlying value of `NaT` is `min(int64)` and cudf retains the underlying value when converting a cudf object to pandas object." ] }, { "cell_type": "code", "execution_count": 17, "id": "de70f244", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2012-01-01 00:00:00.000000\n", "1 \n", "2 2012-01-01 00:00:00.000000\n", "dtype: datetime64[us]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "datetime_series = cudf.Series(\n", " [pd.Timestamp(\"20120101\"), pd.NaT, pd.Timestamp(\"20120101\")]\n", ")\n", "datetime_series" ] }, { "cell_type": "code", "execution_count": 18, "id": "8411a914", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2012-01-01\n", "1 NaT\n", "2 2012-01-01\n", "dtype: datetime64[ns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datetime_series.to_pandas()" ] }, { "cell_type": "markdown", "id": "df664145", "metadata": {}, "source": [ "any operations on rows having `` values in `datetime` column will result in `` value at the same location in resulting column:" ] }, { "cell_type": "code", "execution_count": 19, "id": "829c32d0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0 days 00:00:00\n", "1 \n", "2 0 days 00:00:00\n", "dtype: timedelta64[us]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datetime_series - datetime_series" ] }, { "cell_type": "markdown", "id": "aa8031ef", "metadata": {}, "source": [ "## Calculations with missing data" ] }, { "cell_type": "markdown", "id": "c587fae2", "metadata": {}, "source": [ "Null values propagate naturally through arithmetic operations between pandas objects." ] }, { "cell_type": "code", "execution_count": 20, "id": "f8f2aec7", "metadata": {}, "outputs": [], "source": [ "df1 = cudf.DataFrame(\n", " {\n", " \"a\": [1, None, 2, 3, None],\n", " \"b\": cudf.Series([np.nan, 2, 3.2, 0.1, 1], nan_as_null=False),\n", " }\n", ")" ] }, { "cell_type": "code", "execution_count": 21, "id": "0c8a3011", "metadata": {}, "outputs": [], "source": [ "df2 = cudf.DataFrame(\n", " {\"a\": [1, 11, 2, 34, 10], \"b\": cudf.Series([0.23, 22, 3.2, None, 1])}\n", ")" ] }, { "cell_type": "code", "execution_count": 22, "id": "052f6c2b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
01NaN
1<NA>2.0
223.2
330.1
4<NA>1.0
\n", "
" ], "text/plain": [ " a b\n", "0 1 NaN\n", "1 2.0\n", "2 2 3.2\n", "3 3 0.1\n", "4 1.0" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1" ] }, { "cell_type": "code", "execution_count": 23, "id": "0fb0a083", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
010.23
11122.0
223.2
334<NA>
4101.0
\n", "
" ], "text/plain": [ " a b\n", "0 1 0.23\n", "1 11 22.0\n", "2 2 3.2\n", "3 34 \n", "4 10 1.0" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2" ] }, { "cell_type": "code", "execution_count": 24, "id": "6f8152c0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
02NaN
1<NA>24.0
246.4
337<NA>
4<NA>2.0
\n", "
" ], "text/plain": [ " a b\n", "0 2 NaN\n", "1 24.0\n", "2 4 6.4\n", "3 37 \n", "4 2.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 + df2" ] }, { "cell_type": "markdown", "id": "11170d49", "metadata": {}, "source": [ "While summing the data along a series, `NA` values will be treated as `0`." ] }, { "cell_type": "code", "execution_count": 25, "id": "45081790", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 \n", "2 2\n", "3 3\n", "4 \n", "Name: a, dtype: int64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"]" ] }, { "cell_type": "code", "execution_count": 26, "id": "39922658", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"].sum()" ] }, { "cell_type": "markdown", "id": "6e99afe0", "metadata": {}, "source": [ "Since `NA` values are treated as `0`, the mean would result to 2 in this case `(1 + 0 + 2 + 3 + 0)/5 = 2`" ] }, { "cell_type": "code", "execution_count": 27, "id": "b2f16ddb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"].mean()" ] }, { "cell_type": "markdown", "id": "07f2ec5a", "metadata": {}, "source": [ "To preserve `NA` values in the above calculations, `sum` & `mean` support `skipna` parameter.\n", "By default it's value is\n", "set to `True`, we can change it to `False` to preserve `NA` values." ] }, { "cell_type": "code", "execution_count": 28, "id": "d4a463a0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"].sum(skipna=False)" ] }, { "cell_type": "code", "execution_count": 29, "id": "a944c42e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"].mean(skipna=False)" ] }, { "cell_type": "markdown", "id": "fb8c8f18", "metadata": {}, "source": [ "Cumulative methods like `cumsum` and `cumprod` ignore `NA` values by default." ] }, { "cell_type": "code", "execution_count": 30, "id": "4f2a7306", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 \n", "2 3\n", "3 6\n", "4 \n", "Name: a, dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"].cumsum()" ] }, { "cell_type": "markdown", "id": "c8f6054b", "metadata": {}, "source": [ "To preserve `NA` values in cumulative methods, provide `skipna=False`." ] }, { "cell_type": "code", "execution_count": 31, "id": "d4c46776", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 \n", "2 \n", "3 \n", "4 \n", "Name: a, dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"].cumsum(skipna=False)" ] }, { "cell_type": "markdown", "id": "67077d65", "metadata": {}, "source": [ "## Sum/product of Null/nans" ] }, { "cell_type": "markdown", "id": "ffbb9ca1", "metadata": {}, "source": [ "The sum of an empty or all-NA Series of a DataFrame is 0." ] }, { "cell_type": "code", "execution_count": 32, "id": "f430c9ce", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([np.nan], nan_as_null=False).sum()" ] }, { "cell_type": "code", "execution_count": 33, "id": "7fde514b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([np.nan], nan_as_null=False).sum(skipna=False)" ] }, { "cell_type": "code", "execution_count": 34, "id": "56cedd17", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([], dtype=\"float64\").sum()" ] }, { "cell_type": "markdown", "id": "cb188adb", "metadata": {}, "source": [ "The product of an empty or all-NA Series of a DataFrame is 1." ] }, { "cell_type": "code", "execution_count": 35, "id": "d20bbbef", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([np.nan], nan_as_null=False).prod()" ] }, { "cell_type": "code", "execution_count": 36, "id": "75abbcfa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([np.nan], nan_as_null=False).prod(skipna=False)" ] }, { "cell_type": "code", "execution_count": 37, "id": "becce0cc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cudf.Series([], dtype=\"float64\").prod()" ] }, { "cell_type": "markdown", "id": "0e899e03", "metadata": {}, "source": [ "## NA values in GroupBy" ] }, { "cell_type": "markdown", "id": "7fb20874", "metadata": {}, "source": [ "`NA` groups in GroupBy are automatically excluded. For example:" ] }, { "cell_type": "code", "execution_count": 38, "id": "1379037c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
01NaN
1<NA>2.0
223.2
330.1
4<NA>1.0
\n", "
" ], "text/plain": [ " a b\n", "0 1 NaN\n", "1 2.0\n", "2 2 3.2\n", "3 3 0.1\n", "4 1.0" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1" ] }, { "cell_type": "code", "execution_count": 39, "id": "d6b91e6f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b
a
23.2
1NaN
30.1
\n", "
" ], "text/plain": [ " b\n", "a \n", "2 3.2\n", "1 NaN\n", "3 0.1" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.groupby(\"a\").mean()" ] }, { "cell_type": "markdown", "id": "cb83fb11", "metadata": {}, "source": [ "It is also possible to include `NA` in groups by passing `dropna=False`" ] }, { "cell_type": "code", "execution_count": 40, "id": "768c3e50", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b
a
23.2
1NaN
30.1
<NA>1.5
\n", "
" ], "text/plain": [ " b\n", "a \n", "2 3.2\n", "1 NaN\n", "3 0.1\n", " 1.5" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.groupby(\"a\", dropna=False).mean()" ] }, { "cell_type": "markdown", "id": "133816b4", "metadata": {}, "source": [ "## Inserting missing data" ] }, { "cell_type": "markdown", "id": "306082ad", "metadata": {}, "source": [ "All dtypes support insertion of missing value by assignment. Any specific location in series can made null by assigning it to `None`." ] }, { "cell_type": "code", "execution_count": 41, "id": "7ddde1fe", "metadata": {}, "outputs": [], "source": [ "series = cudf.Series([1, 2, 3, 4])" ] }, { "cell_type": "code", "execution_count": 42, "id": "16e54597", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 3\n", "3 4\n", "dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series" ] }, { "cell_type": "code", "execution_count": 43, "id": "f628f94d", "metadata": {}, "outputs": [], "source": [ "series[2] = None" ] }, { "cell_type": "code", "execution_count": 44, "id": "b30590b7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 \n", "3 4\n", "dtype: int64" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series" ] }, { "cell_type": "markdown", "id": "a1b123d0", "metadata": {}, "source": [ "## Filling missing values: fillna" ] }, { "cell_type": "markdown", "id": "114aa23a", "metadata": {}, "source": [ "`fillna()` can fill in `NA` & `NaN` values with non-NA data." ] }, { "cell_type": "code", "execution_count": 45, "id": "59e22668", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
01NaN
1<NA>2.0
223.2
330.1
4<NA>1.0
\n", "
" ], "text/plain": [ " a b\n", "0 1 NaN\n", "1 2.0\n", "2 2 3.2\n", "3 3 0.1\n", "4 1.0" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1" ] }, { "cell_type": "code", "execution_count": 46, "id": "05c221ee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 10.0\n", "1 2.0\n", "2 3.2\n", "3 0.1\n", "4 1.0\n", "Name: b, dtype: float64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"b\"].fillna(10)" ] }, { "cell_type": "markdown", "id": "401f91b2", "metadata": {}, "source": [ "## Filling with cudf Object" ] }, { "cell_type": "markdown", "id": "e79346d6", "metadata": {}, "source": [ "You can also fillna using a dict or Series that is alignable. The labels of the dict or index of the Series must match the columns of the frame you wish to fill. The use case of this is to fill a DataFrame with the mean of that column." ] }, { "cell_type": "code", "execution_count": 47, "id": "f52c5d8f", "metadata": {}, "outputs": [], "source": [ "import cupy as cp\n", "\n", "dff = cudf.DataFrame(cp.random.randn(10, 3), columns=list(\"ABC\"))" ] }, { "cell_type": "code", "execution_count": 48, "id": "6affebe9", "metadata": {}, "outputs": [], "source": [ "dff.iloc[3:5, 0] = np.nan" ] }, { "cell_type": "code", "execution_count": 49, "id": "1ce1b96f", "metadata": {}, "outputs": [], "source": [ "dff.iloc[4:6, 1] = np.nan" ] }, { "cell_type": "code", "execution_count": 50, "id": "90829195", "metadata": {}, "outputs": [], "source": [ "dff.iloc[5:8, 2] = np.nan" ] }, { "cell_type": "code", "execution_count": 51, "id": "c0feac14", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
0-0.408268-0.676643-1.274743
1-0.029322-0.873593-1.214105
2-0.8663711.081735-0.226840
3NaN0.8122781.074973
4NaNNaN-0.366725
5-1.016239NaNNaN
60.6751231.067536NaN
70.2215682.025961NaN
8-0.3172411.0112750.674891
9-0.877041-1.919394-1.029201
\n", "
" ], "text/plain": [ " A B C\n", "0 -0.408268 -0.676643 -1.274743\n", "1 -0.029322 -0.873593 -1.214105\n", "2 -0.866371 1.081735 -0.226840\n", "3 NaN 0.812278 1.074973\n", "4 NaN NaN -0.366725\n", "5 -1.016239 NaN NaN\n", "6 0.675123 1.067536 NaN\n", "7 0.221568 2.025961 NaN\n", "8 -0.317241 1.011275 0.674891\n", "9 -0.877041 -1.919394 -1.029201" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dff" ] }, { "cell_type": "code", "execution_count": 52, "id": "a07c1260", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
0-0.408268-0.676643-1.274743
1-0.029322-0.873593-1.214105
2-0.8663711.081735-0.226840
3-0.3272240.8122781.074973
4-0.3272240.316145-0.366725
5-1.0162390.316145-0.337393
60.6751231.067536-0.337393
70.2215682.025961-0.337393
8-0.3172411.0112750.674891
9-0.877041-1.919394-1.029201
\n", "
" ], "text/plain": [ " A B C\n", "0 -0.408268 -0.676643 -1.274743\n", "1 -0.029322 -0.873593 -1.214105\n", "2 -0.866371 1.081735 -0.226840\n", "3 -0.327224 0.812278 1.074973\n", "4 -0.327224 0.316145 -0.366725\n", "5 -1.016239 0.316145 -0.337393\n", "6 0.675123 1.067536 -0.337393\n", "7 0.221568 2.025961 -0.337393\n", "8 -0.317241 1.011275 0.674891\n", "9 -0.877041 -1.919394 -1.029201" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dff.fillna(dff.mean())" ] }, { "cell_type": "code", "execution_count": 53, "id": "9e70d61a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
0-0.408268-0.676643-1.274743
1-0.029322-0.873593-1.214105
2-0.8663711.081735-0.226840
3NaN0.8122781.074973
4NaN0.316145-0.366725
5-1.0162390.316145-0.337393
60.6751231.067536-0.337393
70.2215682.025961-0.337393
8-0.3172411.0112750.674891
9-0.877041-1.919394-1.029201
\n", "
" ], "text/plain": [ " A B C\n", "0 -0.408268 -0.676643 -1.274743\n", "1 -0.029322 -0.873593 -1.214105\n", "2 -0.866371 1.081735 -0.226840\n", "3 NaN 0.812278 1.074973\n", "4 NaN 0.316145 -0.366725\n", "5 -1.016239 0.316145 -0.337393\n", "6 0.675123 1.067536 -0.337393\n", "7 0.221568 2.025961 -0.337393\n", "8 -0.317241 1.011275 0.674891\n", "9 -0.877041 -1.919394 -1.029201" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dff.fillna(dff.mean()[1:3])" ] }, { "cell_type": "markdown", "id": "0ace728d", "metadata": {}, "source": [ "## Dropping axis labels with missing data: dropna" ] }, { "cell_type": "markdown", "id": "2ccd7115", "metadata": {}, "source": [ "Missing data can be excluded using `dropna()`:" ] }, { "cell_type": "code", "execution_count": 54, "id": "98c57be7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
01NaN
1<NA>2.0
223.2
330.1
4<NA>1.0
\n", "
" ], "text/plain": [ " a b\n", "0 1 NaN\n", "1 2.0\n", "2 2 3.2\n", "3 3 0.1\n", "4 1.0" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1" ] }, { "cell_type": "code", "execution_count": 55, "id": "bc3f273a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
223.2
330.1
\n", "
" ], "text/plain": [ " a b\n", "2 2 3.2\n", "3 3 0.1" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.dropna(axis=0)" ] }, { "cell_type": "code", "execution_count": 56, "id": "a48d4de0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
1
2
3
4
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: [0, 1, 2, 3, 4]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.dropna(axis=1)" ] }, { "cell_type": "markdown", "id": "0b1954f9", "metadata": {}, "source": [ "An equivalent `dropna()` is available for Series." ] }, { "cell_type": "code", "execution_count": 57, "id": "2dd8f660", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "2 2\n", "3 3\n", "Name: a, dtype: int64" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[\"a\"].dropna()" ] }, { "cell_type": "markdown", "id": "121eb6d7", "metadata": {}, "source": [ "## Replacing generic values" ] }, { "cell_type": "markdown", "id": "3cc4c5f1", "metadata": {}, "source": [ "Often times we want to replace arbitrary values with other values.\n", "\n", "`replace()` in Series and `replace()` in DataFrame provides an efficient yet flexible way to perform such replacements." ] }, { "cell_type": "code", "execution_count": 58, "id": "e6c14e8a", "metadata": {}, "outputs": [], "source": [ "series = cudf.Series([0.0, 1.0, 2.0, 3.0, 4.0])" ] }, { "cell_type": "code", "execution_count": 59, "id": "a852f0cb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.0\n", "1 1.0\n", "2 2.0\n", "3 3.0\n", "4 4.0\n", "dtype: float64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series" ] }, { "cell_type": "code", "execution_count": 60, "id": "f6ac12eb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 5.0\n", "1 1.0\n", "2 2.0\n", "3 3.0\n", "4 4.0\n", "dtype: float64" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series.replace(0, 5)" ] }, { "cell_type": "markdown", "id": "a6e1b6d7", "metadata": {}, "source": [ "We can also replace any value with a `` value." ] }, { "cell_type": "code", "execution_count": 61, "id": "f0156bff", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 \n", "1 1.0\n", "2 2.0\n", "3 3.0\n", "4 4.0\n", "dtype: float64" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series.replace(0, None)" ] }, { "cell_type": "markdown", "id": "6673eefb", "metadata": {}, "source": [ "You can replace a list of values by a list of other values:" ] }, { "cell_type": "code", "execution_count": 62, "id": "f3110f5b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 4.0\n", "1 3.0\n", "2 2.0\n", "3 1.0\n", "4 0.0\n", "dtype: float64" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series.replace([0, 1, 2, 3, 4], [4, 3, 2, 1, 0])" ] }, { "cell_type": "markdown", "id": "61521e8b", "metadata": {}, "source": [ "You can also specify a mapping dict:" ] }, { "cell_type": "code", "execution_count": 63, "id": "45862d05", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 10.0\n", "1 100.0\n", "2 2.0\n", "3 3.0\n", "4 4.0\n", "dtype: float64" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series.replace({0: 10, 1: 100})" ] }, { "cell_type": "markdown", "id": "04a34549", "metadata": {}, "source": [ "For a DataFrame, you can specify individual values by column:" ] }, { "cell_type": "code", "execution_count": 64, "id": "348caa64", "metadata": {}, "outputs": [], "source": [ "df = cudf.DataFrame({\"a\": [0, 1, 2, 3, 4], \"b\": [5, 6, 7, 8, 9]})" ] }, { "cell_type": "code", "execution_count": 65, "id": "cca41ec4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
005
116
227
338
449
\n", "
" ], "text/plain": [ " a b\n", "0 0 5\n", "1 1 6\n", "2 2 7\n", "3 3 8\n", "4 4 9" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 66, "id": "64334693", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
0100100
116
227
338
449
\n", "
" ], "text/plain": [ " a b\n", "0 100 100\n", "1 1 6\n", "2 2 7\n", "3 3 8\n", "4 4 9" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.replace({\"a\": 0, \"b\": 5}, 100)" ] }, { "cell_type": "markdown", "id": "2f0ceec7", "metadata": {}, "source": [ "## String/regular expression replacement" ] }, { "cell_type": "markdown", "id": "c6f44740", "metadata": {}, "source": [ "cudf supports replacing string values using `replace` API:" ] }, { "cell_type": "code", "execution_count": 67, "id": "031d3533", "metadata": {}, "outputs": [], "source": [ "d = {\"a\": list(range(4)), \"b\": list(\"ab..\"), \"c\": [\"a\", \"b\", None, \"d\"]}" ] }, { "cell_type": "code", "execution_count": 68, "id": "12b41efb", "metadata": {}, "outputs": [], "source": [ "df = cudf.DataFrame(d)" ] }, { "cell_type": "code", "execution_count": 69, "id": "d450df49", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
00aa
11bb
22.<NA>
33.d
\n", "
" ], "text/plain": [ " a b c\n", "0 0 a a\n", "1 1 b b\n", "2 2 . \n", "3 3 . d" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 70, "id": "f823bc46", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
00aa
11bb
22A Dot<NA>
33A Dotd
\n", "
" ], "text/plain": [ " a b c\n", "0 0 a a\n", "1 1 b b\n", "2 2 A Dot \n", "3 3 A Dot d" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.replace(\".\", \"A Dot\")" ] }, { "cell_type": "code", "execution_count": 71, "id": "bc52f6e9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
00aa
11<NA><NA>
22A Dot<NA>
33A Dotd
\n", "
" ], "text/plain": [ " a b c\n", "0 0 a a\n", "1 1 \n", "2 2 A Dot \n", "3 3 A Dot d" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.replace([\".\", \"b\"], [\"A Dot\", None])" ] }, { "cell_type": "markdown", "id": "7c1087be", "metadata": {}, "source": [ "Replace a few different values (list -> list):" ] }, { "cell_type": "code", "execution_count": 72, "id": "7e23eba9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
00bb
11bb
22--<NA>
33--d
\n", "
" ], "text/plain": [ " a b c\n", "0 0 b b\n", "1 1 b b\n", "2 2 -- \n", "3 3 -- d" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.replace([\"a\", \".\"], [\"b\", \"--\"])" ] }, { "cell_type": "markdown", "id": "42845a9c", "metadata": {}, "source": [ "Only search in column 'b' (dict -> dict):" ] }, { "cell_type": "code", "execution_count": 73, "id": "d2e79805", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
00aa
11bb
22replacement value<NA>
33replacement valued
\n", "
" ], "text/plain": [ " a b c\n", "0 0 a a\n", "1 1 b b\n", "2 2 replacement value \n", "3 3 replacement value d" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.replace({\"b\": \".\"}, {\"b\": \"replacement value\"})" ] }, { "cell_type": "markdown", "id": "774b42a6", "metadata": {}, "source": [ "## Numeric replacement" ] }, { "cell_type": "markdown", "id": "1c1926ac", "metadata": {}, "source": [ "`replace()` can also be used similar to `fillna()`." ] }, { "cell_type": "code", "execution_count": 74, "id": "355a2f0d", "metadata": {}, "outputs": [], "source": [ "df = cudf.DataFrame(cp.random.randn(10, 2))" ] }, { "cell_type": "code", "execution_count": 75, "id": "d9eed372", "metadata": {}, "outputs": [], "source": [ "df[np.random.rand(df.shape[0]) > 0.5] = 1.5" ] }, { "cell_type": "code", "execution_count": 76, "id": "ae944244", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01
0-0.089358787-0.728419386
1-2.141612003-0.574415182
2<NA><NA>
30.7746434622.07287721
40.93799853-1.054129436
5<NA><NA>
6-0.4352930121.163009584
71.3466232870.31961371
8<NA><NA>
9<NA><NA>
\n", "
" ], "text/plain": [ " 0 1\n", "0 -0.089358787 -0.728419386\n", "1 -2.141612003 -0.574415182\n", "2 \n", "3 0.774643462 2.07287721\n", "4 0.93799853 -1.054129436\n", "5 \n", "6 -0.435293012 1.163009584\n", "7 1.346623287 0.31961371\n", "8 \n", "9 " ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.replace(1.5, None)" ] }, { "cell_type": "markdown", "id": "0f32607c", "metadata": {}, "source": [ "Replacing more than one value is possible by passing a list." ] }, { "cell_type": "code", "execution_count": 77, "id": "59b81c60", "metadata": {}, "outputs": [], "source": [ "df00 = df.iloc[0, 0]" ] }, { "cell_type": "code", "execution_count": 78, "id": "01a71d4c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01
010.000000-0.728419
1-2.141612-0.574415
25.0000005.000000
30.7746432.072877
40.937999-1.054129
55.0000005.000000
6-0.4352931.163010
71.3466230.319614
85.0000005.000000
95.0000005.000000
\n", "
" ], "text/plain": [ " 0 1\n", "0 10.000000 -0.728419\n", "1 -2.141612 -0.574415\n", "2 5.000000 5.000000\n", "3 0.774643 2.072877\n", "4 0.937999 -1.054129\n", "5 5.000000 5.000000\n", "6 -0.435293 1.163010\n", "7 1.346623 0.319614\n", "8 5.000000 5.000000\n", "9 5.000000 5.000000" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.replace([1.5, df00], [5, 10])" ] }, { "cell_type": "markdown", "id": "1080e97b", "metadata": {}, "source": [ "You can also operate on the DataFrame in place:" ] }, { "cell_type": "code", "execution_count": 79, "id": "5f0859d7", "metadata": {}, "outputs": [], "source": [ "df.replace(1.5, None, inplace=True)" ] }, { "cell_type": "code", "execution_count": 80, "id": "5cf28369", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01
0-0.089358787-0.728419386
1-2.141612003-0.574415182
2<NA><NA>
30.7746434622.07287721
40.93799853-1.054129436
5<NA><NA>
6-0.4352930121.163009584
71.3466232870.31961371
8<NA><NA>
9<NA><NA>
\n", "
" ], "text/plain": [ " 0 1\n", "0 -0.089358787 -0.728419386\n", "1 -2.141612003 -0.574415182\n", "2 \n", "3 0.774643462 2.07287721\n", "4 0.93799853 -1.054129436\n", "5 \n", "6 -0.435293012 1.163009584\n", "7 1.346623287 0.31961371\n", "8 \n", "9 " ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 5 }