An introduction to xarray#

Xarray is a Python library for N-D labeled arrays and datasets. It provides two data structures: xarray.DataArray and xarray.Dataset. Both support named dimensions, coordinates and, more recently, also explicit indexes. A DataArray can be seen as an extension of an n-dimensional array like a numpy.ndarray. A DataSet can be seen as an extension of a flat dictionary of arrays.

In this post we will cover dimensions, coordinates and indexes, especially for DataArray objects but also for Dataset objects. Xarray also supports attaching arbitrary metadata to DataArrays, Datasets and even coordinates. The arrays we’ll generate will have random values, so we won’t care much about the values themselves, and we won’t cover attributes in this post either, so we will configure the html view of xarray objects to show indexes by default and hide attributes:

import numpy as np
import xarray as xr

xr.set_options(display_expand_data=False, display_expand_indexes=True, display_expand_attrs=False);

DataArray#

Dimensions#

For me, the most important feature is the ability to label dimensions. Given an array, we can assign a label to each of its dimensions (axes in NumPy terminology):

ary = np.random.default_rng().normal(size=(4, 100, 7, 24))
da_dims = xr.DataArray(ary, dims=["chain", "draw", "person", "time"])
da_dims
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)>
1.164 0.8265 -0.3381 0.9449 1.98 ... -0.8126 0.5171 -1.051 -0.5774 -0.4817
Dimensions without coordinates: chain, draw, person, time

We now have the same data as in the array ary with its dimensions labeled. That means we can indicate operations over specific dimensions using their names instead of positional indicators. For example:

da_dims.mean(dim=("chain", "draw"))
# instead of ary.mean(axis=(0, 1))
<xarray.DataArray (person: 7, time: 24)>
-0.04793 -0.01097 0.03508 -0.07406 ... 0.08875 0.02561 -0.01881 -0.03173
Dimensions without coordinates: person, time

And even more, we can also sort along a specific dimension without needing to even specify any dimension nor axis. For example, to sort along the time dimension using the means of the DataArray we can do:

time_means = da_dims.mean(("chain", "draw", "person"))
da_dims.sortby(time_means)
# instead of
# order = np.argsort(ary.mean(axis=(0, 1, 2)))
# ary[..., order] 
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)>
0.6738 0.9679 0.9449 -0.3381 0.5381 1.98 ... 0.3212 0.7235 0.2429 0.3788 -0.4113
Dimensions without coordinates: chain, draw, person, time

As the time_means DataArray has a single dimension and it is named time, .sortby sorts along that dimension.

Having this information also allows xarray to perform automatic broadcasting:

da_persons = xr.DataArray(np.arange(7), dims="person")
da_dims + da_persons
# instead of ary + np.arange(13)[..., :, None]
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)>
1.164 0.8265 -0.3381 0.9449 1.98 -1.832 ... 6.141 5.187 6.517 4.949 5.423 5.518
Dimensions without coordinates: chain, draw, person, time

Still, for the most part, DataArrays are interchangeable with ndarrays, even for functions that don’t support DataArrays explicitly:

np.fft.rfft(da_dims, n=8, axis=2)
Hide code cell output
array([[[[-7.05103938e-01+0.j        ,  3.20998619e+00+0.j        ,
          -3.89301708e-01+0.j        , ...,
          -3.28223564e+00+0.j        , -3.93905370e+00+0.j        ,
          -1.46815152e+00+0.j        ],
         [ 2.63681065e+00-0.63377221j, -7.24334467e-01-1.10928501j,
           7.38184144e-01-1.1649908j , ...,
          -4.85216041e-01+2.50127064j, -5.36173630e-01-3.22612384j,
          -1.80993821e+00+2.778141j  ],
         [ 8.05211921e-01-1.01508558j, -8.76457636e-01+3.17940903j,
           2.70102889e+00-2.01384215j, ...,
           1.41752605e+00-0.38687799j,  8.02449687e-01+1.61608384j,
           1.53501722e+00-0.39785814j],
         [ 3.24115984e+00+0.80562246j,  1.73609377e+00-0.98200847j,
          -1.95460231e+00-2.14114662j, ...,
           5.10215533e-01+1.0456462j , -2.93548902e+00+1.40644197j,
           2.56769326e-01+0.70846213j],
         [-3.34692097e+00+0.j        ,  3.13169766e+00+0.j        ,
          -5.28439527e+00+0.j        , ...,
          -4.47646548e-01+0.j        ,  1.12839572e-01+0.j        ,
          -2.55832185e-01+0.j        ]],

        [[ 5.66573148e+00+0.j        , -4.48005722e+00+0.j        ,
          -8.21879415e-01+0.j        , ...,
          -2.05134761e+00+0.j        ,  5.17183494e-02+0.j        ,
           1.39261826e+00+0.j        ],
         [-7.16916338e-01-1.20434428j,  1.80672069e+00-3.6726229j ,
           1.51296495e+00-0.06577871j, ...,
          -5.66719875e-01+1.94383976j,  8.46068138e-01-1.78544408j,
          -1.36209121e+00-0.02484927j],
         [ 2.17454028e+00-0.61967507j, -5.39870954e+00+1.1943909j ,
          -1.29473302e+00+0.96514436j, ...,
          -3.89581683e+00-0.14737827j, -1.02544549e+00-2.40059281j,
          -8.63131064e-01+2.5933031j ],
         [-1.01860371e+00-2.42659377j,  1.21909286e-01+2.61289013j,
           2.46808073e+00-0.75024111j, ...,
           3.15667028e+00+1.99695857j, -3.77012991e-02+1.48175172j,
           2.18597104e+00-2.50189672j],
         [-2.81890129e-01+0.j        , -1.20728074e+00+0.j        ,
          -1.39635469e+00+0.j        , ...,
          -2.03864256e+00+0.j        , -3.92911440e+00+0.j        ,
          -2.01185175e+00+0.j        ]],

        [[ 2.51399507e+00+0.j        ,  1.15639939e-01+0.j        ,
           3.25064110e+00+0.j        , ...,
          -1.61948418e+00+0.j        ,  3.99291630e+00+0.j        ,
           3.56264240e+00+0.j        ],
         [-3.49013063e+00+1.98882299j, -3.75468264e+00+1.80937208j,
           8.44602457e-01-2.85380207j, ...,
           3.19968173e-02-1.88853035j, -9.02894012e-01+3.67643011j,
           3.97043413e-01-2.06830469j],
         [ 1.98275445e+00-1.35923036j, -1.78874493e+00-0.89932934j,
          -6.43004380e-01+2.94572737j, ...,
          -4.88973711e+00+2.37730413j, -1.75016393e+00-1.47954398j,
          -1.50567621e+00+1.99763831j],
         [ 2.92598474e+00+3.16373188j, -1.61770013e+00+2.48450334j,
           4.31787489e+00+0.708441j  , ...,
          -9.91833427e-01+0.37827845j,  1.31039676e-01+0.5803137j ,
           3.41003829e+00-2.61481766j],
         [-1.99141005e+00+0.j        ,  1.36728847e+00+0.j        ,
           1.19614814e+00+0.j        , ...,
           2.44722000e+00+0.j        ,  5.59157012e+00+0.j        ,
          -3.32603435e+00+0.j        ]],

        ...,

        [[ 2.01348494e+00+0.j        , -2.20622442e+00+0.j        ,
           5.29791638e-01+0.j        , ...,
           2.60188735e+00+0.j        ,  3.94036915e+00+0.j        ,
           3.67976569e+00+0.j        ],
         [-2.47940752e+00-4.45026843j,  1.97395451e+00+1.3991405j ,
           1.45276533e+00-1.94389097j, ...,
          -5.63472438e-01+2.70736955j, -4.85553272e-01-0.28991876j,
           4.66758980e-01-1.21600193j],
         [ 5.42325255e+00+1.79414182j, -8.73129047e-01+1.11639694j,
          -3.34631372e+00+2.43770924j, ...,
          -2.37260338e+00-4.16621109j, -1.17376381e+00+2.6643399j ,
          -1.54241215e+00+1.49163253j],
         [-1.98113039e+00+0.89670968j, -8.67653768e-01-1.98162734j,
          -3.57729650e-01-1.25520421j, ...,
           2.73397054e+00+4.6821489j ,  2.49075364e+00-0.92810848j,
           2.59790646e+00-1.89278806j],
         [-1.28392685e-01+0.j        ,  3.22144964e+00+0.j        ,
           3.44143288e+00+0.j        , ...,
           5.65338055e-02+0.j        ,  3.33736471e+00+0.j        ,
          -7.47359895e-01+0.j        ]],

        [[ 4.23665579e+00+0.j        , -2.65236758e+00+0.j        ,
           9.37613460e-01+0.j        , ...,
           1.32134137e+00+0.j        ,  6.35751579e-01+0.j        ,
          -2.67178268e+00+0.j        ],
         [ 2.09173672e-01-1.37108788j, -8.98548792e-01+1.97633976j,
          -1.58035448e+00-0.12683061j, ...,
           1.09220481e-02-1.04875162j,  4.97480488e-01-2.65297667j,
          -1.43899410e+00-1.01898976j],
         [-6.02131956e+00-0.4567748j ,  3.26789540e+00+4.0548669j ,
          -3.04819563e+00-0.75516924j, ...,
          -1.52691048e+00-1.61243248j,  1.47473688e+00-0.63533477j,
          -3.27159535e+00+1.21886529j],
         [ 1.46130473e+00-1.5653558j ,  2.57174837e+00-2.14743628j,
           1.47837807e+00+3.44013995j, ...,
           1.55066765e+00+3.80429288j, -2.28165101e+00-0.67872384j,
           2.83146034e-01+0.36474936j],
         [-2.60043301e+00+0.j        ,  3.07657886e-01+0.j        ,
          -2.12699062e-01+0.j        , ...,
          -1.84128879e-01+0.j        , -1.41636856e+00+0.j        ,
          -3.59476168e+00+0.j        ]],

        [[-7.54465639e-01+0.j        ,  2.44800792e+00+0.j        ,
           1.23824871e+00+0.j        , ...,
           6.45572839e-01+0.j        ,  2.78595440e+00+0.j        ,
          -3.90408983e+00+0.j        ],
         [-3.99517572e+00-0.41487174j, -1.53081442e+00+0.97270356j,
          -8.57349969e-01+3.058407j  , ...,
          -1.79751413e-01-0.18103217j,  4.03335725e+00+1.54764782j,
          -4.94662107e+00+0.59763042j],
         [ 1.30046332e+00+1.17357156j,  1.43680918e+00-0.86856531j,
          -3.37811567e+00-2.80375313j, ...,
          -2.90297016e+00+0.3492766j ,  1.33835308e+00-4.89516219j,
          -8.57280437e-01-1.76495998j],
         [-4.72420211e-01+1.62592437j, -5.28192950e-01-3.7311135j ,
           2.04752708e+00-0.52719071j, ...,
          -3.74159597e-02+1.89539726j,  8.34527562e-01-1.75039062j,
          -1.23226832e+00+5.09443148j],
         [-1.67656407e+00+0.j        , -4.60802433e+00+0.j        ,
          -4.89769366e+00+0.j        , ...,
           3.56731167e+00+0.j        , -2.76726326e+00+0.j        ,
          -4.63710666e+00+0.j        ]]],


       [[[ 2.28204678e-01+0.j        ,  1.11674938e+00+0.j        ,
           2.46588699e+00+0.j        , ...,
          -1.30904880e+00+0.j        , -3.29323454e-01+0.j        ,
          -2.52330358e-01+0.j        ],
         [-5.72940044e-01-2.15715897j,  2.33859353e-02+0.53923833j,
           2.65262459e+00-2.34812157j, ...,
           1.90622130e+00+2.31217401j, -8.89856682e-01+1.57348599j,
           2.38970194e+00-1.44698256j],
         [ 2.69509902e+00+0.79867375j, -2.26251044e-01-1.74049658j,
          -6.75353458e-01+0.9305965j , ...,
          -1.29540422e+00-0.31814278j,  7.04175238e-01-0.08804204j,
          -2.52655723e+00+0.23264895j],
         [-1.08805671e+00+2.09746512j,  2.07432040e+00+1.55859216j,
           1.94490257e+00-1.90247076j, ...,
           1.08594704e+00-1.45900698j, -1.03026229e+00-2.72010171j,
           3.66568153e-01-1.18368025j],
         [ 2.46961728e+00+0.j        , -2.29792278e+00+0.j        ,
          -6.83295234e-01+0.j        , ...,
           4.21268994e-01+0.j        , -1.92840351e+00+0.j        ,
          -6.46208236e-01+0.j        ]],

        [[-3.66431072e+00+0.j        , -2.02987973e+00+0.j        ,
           2.29165853e+00+0.j        , ...,
          -6.62086608e-01+0.j        , -3.84130261e+00+0.j        ,
           3.15990141e+00+0.j        ],
         [ 5.00362127e-01+0.1742231j ,  1.33881522e+00+0.8292724j ,
           1.53086653e+00-0.24502918j, ...,
          -2.28635230e+00+0.17978627j,  8.82075167e-01+0.02299693j,
          -9.60243012e-01+0.1350562j ],
         [ 1.14406051e+00-1.87259856j,  3.33013886e+00-1.8253407j ,
          -6.82416852e-01-2.40881928j, ...,
          -2.04979299e+00-0.96685655j,  1.31523746e-01+1.60348357j,
          -1.39725362e+00-0.1318368j ],
         [-2.05478914e+00+0.0551367j ,  7.52057736e-01+1.09564391j,
          -8.35920915e-01+1.21113105j, ...,
          -7.32866449e-01+2.43901478j, -1.49761313e+00+0.0209441j ,
           9.68768943e-01+0.21837365j],
         [-3.47161453e+00+0.j        , -2.12851799e+00+0.j        ,
           2.18743725e+00+0.j        , ...,
          -1.08921650e+00+0.j        ,  2.79319398e+00+0.j        ,
           6.68018231e-01+0.j        ]],

        [[-3.77309323e+00+0.j        ,  3.06218702e-02+0.j        ,
          -1.44692329e+00+0.j        , ...,
          -1.01926447e+00+0.j        ,  2.51550089e+00+0.j        ,
           1.41240979e-02+0.j        ],
         [-1.69586563e+00+0.12540939j, -5.42622538e-01+1.64883381j,
           1.83277609e+00-1.37020427j, ...,
           2.24863679e+00+2.6472236j ,  2.54635851e+00-1.36193883j,
           1.44173463e+00+3.21908311j],
         [ 3.39224620e+00+2.14088255j,  1.91198812e+00+0.03542532j,
          -2.09758404e+00+1.24206381j, ...,
          -1.84267150e+00+0.18175414j,  1.16640875e+00+0.3001499j ,
           9.42867744e-01-0.59925236j],
         [-2.67023750e+00+1.39841573j,  2.31197623e+00+2.15312372j,
           1.53180279e+00+1.488797j  , ...,
           2.19828410e+00+0.21382725j,  2.16197251e+00-0.43588995j,
           8.73849005e-02-3.17337336j],
         [ 4.04165594e+00+0.j        ,  1.44124013e+00+0.j        ,
           1.63056029e+00+0.j        , ...,
           3.46159019e+00+0.j        ,  1.11689073e+00+0.j        ,
           7.95602473e-01+0.j        ]],

        ...,

        [[-3.25219924e-01+0.j        ,  1.15281704e+00+0.j        ,
          -1.00005696e+00+0.j        , ...,
          -2.45602557e+00+0.j        , -2.45589128e+00+0.j        ,
          -5.95186197e+00+0.j        ],
         [ 2.75650417e+00+4.34844891j, -2.84410871e-01-2.21934021j,
           2.16114412e-01+0.12821621j, ...,
          -5.33387280e-01+1.13500373j, -1.11994819e-01-1.6038346j ,
          -2.69257462e-01-2.58303659j],
         [ 3.02516178e+00-2.00809865j, -7.97019496e-01+0.40081434j,
           4.06916135e+00+0.75749161j, ...,
           4.30699984e+00-2.39786145j, -4.50702108e+00+2.344143j  ,
          -4.33149648e-01-0.22856827j],
         [ 2.85962199e+00+1.26857576j, -1.15534364e+00-1.97130098j,
           3.57679225e-01+1.5715332j , ...,
          -2.18161245e+00-0.41854799j,  4.67748685e-01-0.36095919j,
           1.48726992e+00+4.2001731j ],
         [ 3.45642462e+00+0.j        , -2.74033093e+00+0.j        ,
           2.71853205e+00+0.j        , ...,
          -3.90758467e+00+0.j        , -1.36612443e+00+0.j        ,
          -6.60612698e+00+0.j        ]],

        [[-1.53959136e+00+0.j        , -6.83530523e-01+0.j        ,
           8.74051832e-01+0.j        , ...,
           6.02012121e-01+0.j        , -4.40372516e-01+0.j        ,
           1.88884500e+00+0.j        ],
         [-2.82370484e+00-1.49481788j,  5.24174032e-01-0.28651448j,
           3.23653095e-01+0.80643245j, ...,
          -2.43698033e+00+2.68367868j,  9.90181614e-01-0.80332514j,
           2.84138994e-01+0.48295765j],
         [ 2.83801537e+00+3.33246593j,  1.34997375e+00+0.91753727j,
           3.85316653e-01-2.18354201j, ...,
          -2.80674904e-01-2.02370397j, -2.22210014e+00+1.84978599j,
          -3.48682794e+00-2.54347343j],
         [ 1.19953928e+00+2.0229844j , -1.46138842e-01+1.6628312j ,
          -3.12328811e+00-1.71518363j, ...,
          -7.26152456e-02+3.16914158j,  1.48744561e+00-0.68485705j,
           7.69205595e-01+1.14549334j],
         [ 1.82554354e-01+0.j        ,  4.04591533e+00+0.j        ,
           9.65105585e-02+0.j        , ...,
           1.48802480e+00+0.j        ,  4.51354547e-01+0.j        ,
          -1.58111216e+00+0.j        ]],

        [[-6.24508989e+00+0.j        ,  9.95642445e-01+0.j        ,
           8.46532785e-01+0.j        , ...,
          -3.53547249e+00+0.j        , -6.09947233e+00+0.j        ,
           1.22691621e+00+0.j        ],
         [ 2.25526660e+00-3.04622232j,  1.28048067e+00+1.3135754j ,
           1.16760518e+00-1.86728659j, ...,
          -2.16805978e+00+1.38167277j,  1.02223561e+00+0.81796474j,
           1.73649888e+00-1.53478504j],
         [-3.14632837e+00+1.08072736j,  3.36498665e+00-1.47388002j,
           4.25904924e-01-1.5966948j , ...,
           1.10470177e+00+2.3891663j , -8.00473121e-02+3.3053393j ,
           2.34240572e+00-2.77235127j],
         [-5.09773731e+00-4.59737366j,  6.12069778e-01-0.55945174j,
          -1.30911569e+00+3.24875392j, ...,
          -3.47870357e-01+1.83146367j, -6.72106462e-01-1.81544252j,
          -2.33728038e-01+0.67504838j],
         [-4.49459440e+00+0.j        ,  5.96500644e-02+0.j        ,
           3.10944526e+00+0.j        , ...,
           3.21278467e+00+0.j        ,  1.49672121e+00+0.j        ,
          -2.74731593e+00+0.j        ]]],


       [[[-1.11417247e+00+0.j        , -4.10612132e+00+0.j        ,
           7.38182147e-01+0.j        , ...,
          -1.96584242e+00+0.j        , -1.41373442e-01+0.j        ,
           3.66325857e+00+0.j        ],
         [ 1.62504825e+00+0.52325899j,  1.57417191e+00-1.30372055j,
           1.81857793e-01+3.09071349j, ...,
          -3.43721054e-01-0.2250892j , -6.49076023e-02+3.31699315j,
          -2.67694347e+00-2.57490306j],
         [-7.44116790e-01-0.66342903j, -3.57481697e+00-1.09506426j,
           1.63946566e+00-1.6762944j , ...,
          -2.41747128e+00+1.65949022j, -8.73687787e-01+0.44551125j,
          -1.56307291e+00+0.33081337j],
         [ 4.04396350e+00+1.16979538j, -2.62078494e+00+0.09506434j,
           1.32607048e+00-0.21776989j, ...,
           8.40240719e-01-0.66112353j, -1.25218810e+00-1.8812777j ,
           9.00493681e-01+1.59250907j],
         [-3.46755282e+00+0.j        , -2.07298298e+00+0.j        ,
          -1.69611943e-01+0.j        , ...,
          -1.57453084e+00+0.j        ,  4.45912550e+00+0.j        ,
          -2.12368973e+00+0.j        ]],

        [[-5.38236848e-01+0.j        ,  1.23907586e+00+0.j        ,
          -4.94175185e+00+0.j        , ...,
           3.26300793e+00+0.j        , -4.99721025e+00+0.j        ,
          -1.92499891e+00+0.j        ],
         [-1.71698438e+00-0.9832957j ,  5.05572657e-01-0.97231324j,
           2.45246437e+00+0.43630134j, ...,
           2.69097737e-03-0.81763107j, -1.82600630e-01-1.00196128j,
           1.11875960e-01-0.06010646j],
         [-1.58664540e+00+2.87822373j,  1.60346044e-01+1.49673479j,
           6.47665054e-01-0.87150724j, ...,
          -9.94638295e-01+1.91404525j, -1.07760710e+00+0.81179684j,
          -1.97279911e+00-0.12028356j],
         [ 2.02493036e+00+1.47195548j,  1.40155198e+00-2.77025095j,
          -1.13136804e+00+1.36239382j, ...,
           3.36889687e+00+0.77830432j, -1.94161924e+00+1.29163109j,
          -3.47540648e-01+1.44734244j],
         [ 6.17413331e-01+0.j        , -2.32734568e+00+0.j        ,
           9.27277138e-01+0.j        , ...,
           2.27494798e+00+0.j        , -4.76333624e-01+0.j        ,
           4.45995101e-01+0.j        ]],

        [[-1.93536171e+00+0.j        , -1.12048053e+00+0.j        ,
          -1.40307940e+00+0.j        , ...,
           2.91319667e+00+0.j        , -3.81467648e+00+0.j        ,
          -2.99359077e+00+0.j        ],
         [-9.84439659e-01-0.86548504j,  4.31521904e-01-0.29210012j,
           2.72245934e+00-0.48066832j, ...,
          -7.83255712e-01+1.66557625j,  3.63592118e-01-0.19929632j,
          -1.98389394e-01-1.87473382j],
         [-3.35809533e+00+3.59322445j, -1.04247352e+00-1.65201738j,
           1.60060192e+00-0.07651537j, ...,
          -5.70593776e-01-3.49224714j, -1.12104769e+00+0.52028717j,
          -4.24799549e-01+0.43163151j],
         [-1.35159016e-01-0.84014089j,  6.53208973e-01+1.28570034j,
           1.19427577e+00-1.46061826j, ...,
          -5.61820923e-01+1.40509803j, -4.12291550e-02+2.76109374j,
           8.98196832e-02+1.74563406j],
         [ 1.63790366e+00+0.j        , -3.33286523e+00+0.j        ,
          -2.14032601e+00+0.j        , ...,
          -4.18644782e-02+0.j        ,  1.42133023e+00+0.j        ,
          -2.72049156e+00+0.j        ]],

        ...,

        [[-1.73599543e+00+0.j        ,  2.60960690e+00+0.j        ,
          -2.99696199e+00+0.j        , ...,
          -1.32571694e+00+0.j        ,  2.73676566e+00+0.j        ,
          -2.78811210e+00+0.j        ],
         [ 2.62743163e-01+0.02288145j, -1.51735414e+00-1.49657654j,
           4.00961369e-01+0.7357825j , ...,
           2.75936875e+00-3.17751879j, -2.19226395e+00-0.82740782j,
           2.28718986e+00-2.16921752j],
         [ 4.01222409e-01+2.69050905j, -2.64290413e-01+0.09438691j,
           2.94184606e+00+0.48935307j, ...,
           4.26631715e+00+0.67557341j,  2.20586826e+00+2.15985577j,
           8.71695404e-02-1.44261576j],
         [-1.08166223e+00-1.25446266j, -1.85059581e+00-0.39306674j,
           4.39484727e-01-1.38423787j, ...,
           7.01226329e-01-0.47204366j,  1.18750279e+00-1.68228536j,
          -9.25776617e-02+0.80480425j],
         [ 3.80458016e+00+0.j        ,  5.97296443e-01+0.j        ,
          -2.98979049e+00+0.j        , ...,
          -2.22517791e+00+0.j        , -1.27247691e+00+0.j        ,
          -4.23741588e+00+0.j        ]],

        [[-1.30273703e+00+0.j        ,  6.70224324e+00+0.j        ,
          -1.22420027e+00+0.j        , ...,
          -3.11208674e+00+0.j        , -1.61035094e-01+0.j        ,
          -3.18058220e+00+0.j        ],
         [-1.65419280e+00+3.10743869j, -2.65163773e+00-1.74649361j,
          -1.39604591e-01+1.2750009j , ...,
           3.33452959e+00+1.84637499j, -9.03588717e-01+1.59929988j,
          -3.76342958e+00-0.01789431j],
         [-2.32463716e-01+0.85640203j, -1.94565096e+00-0.30572879j,
          -1.93658225e+00-0.04892631j, ...,
           6.66449919e-01-3.02348439j,  1.18429538e+00+0.78614788j,
           1.39083415e+00+2.76625762j],
         [-1.11907938e+00-2.77668762j, -2.46165315e+00-1.88939767j,
          -1.37926178e-01+2.48066586j, ...,
           3.74135435e+00+3.95512835j,  1.40886357e+00-1.15931943j,
          -6.23807099e-01-0.140642j  ],
         [ 1.21055022e-01+0.j        ,  6.80180126e-01+0.j        ,
           3.98688833e+00+0.j        , ...,
          -1.52982789e+00+0.j        , -1.23681439e+00+0.j        ,
          -2.31236785e+00+0.j        ]],

        [[-1.94847959e+00+0.j        ,  7.23953570e-01+0.j        ,
           1.05056797e+00+0.j        , ...,
           2.12592562e-01+0.j        ,  2.08718056e+00+0.j        ,
           4.40433742e+00+0.j        ],
         [-2.83386464e+00-1.69815119j,  1.41287128e+00-3.21385207j,
           1.23039940e+00-0.45492841j, ...,
          -1.05171241e+00+2.75493644j, -1.52513516e+00-0.1358442j ,
          -3.70131651e+00-1.55291823j],
         [ 1.66612659e+00+3.38642669j, -6.92990613e-01+0.50371798j,
           5.44211650e-02-1.73528884j, ...,
           2.86559312e-01+0.22955612j, -1.65756488e+00+2.06391072j,
           1.57312921e+00+2.27230483j],
         [-2.22332194e+00-0.93856846j, -2.44687507e-01+1.58500899j,
          -9.05686298e-01-2.10015023j, ...,
          -1.44726370e+00+0.92337773j,  1.74274870e+00+1.13492451j,
           2.00849323e+00+1.42641698j],
         [ 2.32051325e-01+0.j        ,  1.77199968e+00+0.j        ,
          -3.01255522e+00+0.j        , ...,
           6.43302060e+00+0.j        ,  3.00642925e+00+0.j        ,
           6.95156922e-01+0.j        ]]],


       [[[-4.97411372e+00+0.j        , -2.45689218e+00+0.j        ,
          -2.71221773e+00+0.j        , ...,
           2.78051456e-01+0.j        , -2.77032881e+00+0.j        ,
           2.69523067e+00+0.j        ],
         [-1.22977586e+00-0.39449618j, -1.83996004e+00-0.1701448j ,
          -3.96717423e-01+0.95994278j, ...,
           2.62786479e-01+0.80223022j, -1.02980411e+00+2.79534239j,
          -2.40817225e+00+0.64815626j],
         [ 1.71996214e+00+2.7154812j , -3.60359846e+00+1.65773781j,
          -2.41821712e+00-0.366324j  , ...,
           2.34884171e+00+1.52841101j,  1.05284416e+00-1.03362442j,
          -4.55198125e+00-1.03820941j],
         [ 2.22689194e+00-0.31893238j,  7.76251550e-02+0.27939908j,
          -2.09334344e+00-1.0036685j , ...,
           1.60623310e+00-2.5271102j , -1.43625083e+00+1.24032444j,
          -5.62741373e-01-0.7022142j ],
         [-5.44055815e+00+0.j        , -1.69878267e+00+0.j        ,
          -1.10731172e+00+0.j        , ...,
          -1.00439562e+00+0.j        ,  1.44451957e+00+0.j        ,
          -2.06747100e+00+0.j        ]],

        [[ 3.45698998e-01+0.j        ,  2.40189763e+00+0.j        ,
          -2.21156545e+00+0.j        , ...,
          -3.30523152e+00+0.j        ,  1.44421942e+00+0.j        ,
           4.05611898e+00+0.j        ],
         [-3.85471510e-01+0.55571113j, -2.85815768e+00+2.53059094j,
          -1.32347937e+00-2.59035663j, ...,
          -5.20936104e-01-1.85907293j, -8.15403328e-01-2.2607936j ,
          -1.12697655e+00-1.76888605j],
         [ 1.82357371e+00-1.42565086j, -1.56054803e+00-0.38127237j,
           1.39999184e+00+1.65089901j, ...,
           1.60140159e+00-1.09456774j, -1.28417831e+00+1.37766151j,
           3.31051666e-01+0.14520203j],
         [ 1.26403797e+00+1.19016218j, -3.08338166e-01+1.06201866j,
          -2.21802257e+00+2.11314669j, ...,
          -4.37435521e+00+1.36040178j,  7.89651107e-02-0.38215782j,
           1.34949835e+00+1.59469772j],
         [-2.36932369e+00+0.j        ,  3.11408079e+00+0.j        ,
           1.68043091e+00+0.j        , ...,
          -7.50036929e-01+0.j        , -8.02983286e-01+0.j        ,
           5.97919147e-01+0.j        ]],

        [[-1.58565994e-01+0.j        ,  2.06701989e+00+0.j        ,
           4.15021169e+00+0.j        , ...,
           5.10810031e+00+0.j        ,  1.20266573e+00+0.j        ,
          -5.06000100e-01+0.j        ],
         [ 3.59540639e+00+1.21483259j, -1.97478228e+00-1.41956875j,
          -5.57255872e-01-2.85257107j, ...,
          -6.84972827e-01-5.26662712j,  1.58481294e-01+2.71976012j,
          -1.13996254e+00-0.88116269j],
         [ 2.88872997e-01-1.05996498j, -3.15539495e+00-1.58379064j,
           2.61255643e+00+1.40399215j, ...,
          -5.77796331e-01+0.46807648j, -2.46941062e+00-3.32257291j,
          -4.06327061e+00-0.56798851j],
         [ 1.23531644e+00-0.20400421j,  1.29267789e+00+5.06633517j,
           2.66445894e+00+2.60583775j, ...,
           6.01827505e-01+0.82161735j,  1.69262386e+00+0.84822551j,
           7.83155706e-01+1.9052187j ],
         [ 2.48870245e+00+0.j        , -5.64141357e-01+0.j        ,
           2.05306959e+00+0.j        , ...,
          -2.06175030e+00+0.j        , -2.56619164e+00+0.j        ,
          -2.91344313e+00+0.j        ]],

        ...,

        [[ 3.66808345e+00+0.j        ,  1.50013163e+00+0.j        ,
           2.15654619e+00+0.j        , ...,
           4.71359064e+00+0.j        ,  9.46198012e-01+0.j        ,
          -4.96746125e+00+0.j        ],
         [ 2.87165468e+00+1.57105162j,  1.78585087e+00-2.36411764j,
           3.29910105e+00+0.06054646j, ...,
          -6.21946553e-01+0.4218751j , -2.58283559e+00+4.12393829j,
           5.17177615e+00-0.47683331j],
         [-1.50306833e+00-2.62495158j, -2.19356040e+00-0.60309544j,
           3.05915451e+00-1.30647229j, ...,
          -1.10668187e+00-0.90273913j,  8.68310835e-02+0.0109907j ,
          -8.85340954e-01+0.19800307j],
         [-2.14617805e-01-1.667229j  , -3.78855413e+00-2.60103718j,
           1.82286700e+00-1.96583757j, ...,
           1.75138340e+00-0.78184286j, -5.55299480e-01-3.26215835j,
          -3.68038256e-01+0.36670947j],
         [ 2.64681333e+00+0.j        ,  1.15555061e+00+0.j        ,
          -1.06317672e+00+0.j        , ...,
          -9.57354330e-01+0.j        , -6.80448789e-01+0.j        ,
           3.10728692e+00+0.j        ]],

        [[ 1.05869105e+00+0.j        ,  1.00261817e+00+0.j        ,
           2.83253499e-01+0.j        , ...,
           1.72161960e-01+0.j        ,  2.16012149e+00+0.j        ,
           1.05455221e+00+0.j        ],
         [ 2.95126771e+00-0.74092866j,  1.50929326e+00+2.25452063j,
           1.42345407e+00-0.0864217j , ...,
           3.27615525e+00-0.41041367j,  1.34735988e+00-2.29946142j,
          -2.54519584e+00-2.27726697j],
         [ 1.16440057e-01-1.83852523j,  6.20757627e-01-0.57468291j,
          -4.92333915e-01-0.82867988j, ...,
          -1.56526838e+00+0.54223215j,  9.00448297e-01-3.47864987j,
          -5.81515953e-01+1.64778296j],
         [-3.21908416e+00-0.4892671j ,  1.71946058e+00-1.690601j  ,
           1.28156773e+00-0.35825546j, ...,
           1.67224787e+00-3.84558561j,  1.58113353e+00+3.32091674j,
          -6.82781771e-01+0.21030321j],
         [ 4.36807636e+00+0.j        ,  3.53533657e-01+0.j        ,
          -1.80231714e+00+0.j        , ...,
          -2.49399808e+00+0.j        , -3.68322814e+00+0.j        ,
          -1.20686128e+00+0.j        ]],

        [[-1.95700755e+00+0.j        , -1.94054845e+00+0.j        ,
           2.54031514e+00+0.j        , ...,
          -1.23040300e+00+0.j        , -3.11036597e+00+0.j        ,
           1.71163266e+00+0.j        ],
         [ 1.89089364e+00-0.47609814j,  1.07970380e+00-1.12651079j,
           1.29649962e+00+2.72149424j, ...,
           1.17714599e+00-2.91054927j, -2.19269040e+00-2.22198698j,
          -4.77125185e-01+1.94370265j],
         [ 5.81698472e-01+0.92267727j,  2.99656846e+00+1.6949603j ,
           3.82216161e+00-2.07870969j, ...,
           1.06969938e+00+1.40548078j, -1.40231693e+00+2.81088398j,
           3.70059829e+00-3.13629137j],
         [-2.69063952e+00-1.61879639j, -2.58897970e+00-3.16049067j,
           1.99527445e+00+0.80102862j, ...,
          -1.40490164e+00+0.57107601j, -2.77516591e+00+0.10698507j,
          -7.76277083e-01+1.13126318j],
         [ 3.40498508e+00+0.j        ,  5.74938514e-01+0.j        ,
           2.37627852e+00+0.j        , ...,
           1.92361056e+00+0.j        ,  3.44082375e-01+0.j        ,
           2.10772976e-01+0.j        ]]]])

It is possible to convert a DataArray to an ndarray using da.values but as we have just seen, very often it is not necessary.

Coordinates and indexes#

In addition, it is also possible to use coordinates. For DataArrays, coordinates are kind of in between data and attributes. A DataArray is equivalent to a single array, however, it can have as many coordinates as desired. When creating a DataArray, the coords argument can be used to initialize the coordinates. It is a dictionary whose keys are coordinate names, and its values are DataArrays, arrays or tuples:

ary = np.random.default_rng().normal(size=(4, 100, 7, 24))
da = xr.DataArray(
    ary,
    dims=["chain", "draw", "person", "time"],
    coords={
        "chain": [1, 2, 3, 4],
        "gender": ("person", ["male", "neutrois", "female", "male", "NB", "female", "female"]),
        "age": ("person", [25, 30, 73, 47, 51, 20, 64]),
    }
)
da
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)>
-2.333 -0.9839 0.8063 1.505 -0.0909 1.33 ... -0.1766 0.484 0.1027 -0.1825 1.198
Coordinates:
  * chain    (chain) int64 1 2 3 4
    gender   (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female'
    age      (person) int64 25 30 73 47 51 20 64
Dimensions without coordinates: draw, person, time

Alternatively, xarray.DataArray.assign_coords can be used to add coordinates to an existing DataArray:

da = da.assign_coords(city=("person", ["Lleida", "Leida", "Sau", "Lleida", "Reus", "Sau", "Reus"]))
da
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)>
-2.333 -0.9839 0.8063 1.505 -0.0909 1.33 ... -0.1766 0.484 0.1027 -0.1825 1.198
Coordinates:
  * chain    (chain) int64 1 2 3 4
    gender   (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female'
    age      (person) int64 25 30 73 47 51 20 64
    city     (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus'
Dimensions without coordinates: draw, person, time

So far we have added all these coordinates, but we can’t yet use that information directly for, say, slicing. Only the values of chain can be used for that, hence it being bold in the “Coordinates” section and appearing in the “Indexes” section. To use the rest of the coordinates for slicing too, we need to indicate that to xarray:

da = da.set_xindex("gender").set_xindex("city")
da
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)>
-2.333 -0.9839 0.8063 1.505 -0.0909 1.33 ... -0.1766 0.484 0.1027 -0.1825 1.198
Coordinates:
  * chain    (chain) int64 1 2 3 4
  * gender   (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female'
    age      (person) int64 25 30 73 47 51 20 64
  * city     (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus'
Dimensions without coordinates: draw, person, time

We can slice along the person dimension using the gender coordinate:

da.sel(gender="female")
<xarray.DataArray (chain: 4, draw: 100, person: 3, time: 24)>
-0.2618 -0.6076 -0.4886 -0.6244 -0.1986 ... -0.1766 0.484 0.1027 -0.1825 1.198
Coordinates:
  * chain    (chain) int64 1 2 3 4
  * gender   (person) <U8 'female' 'female' 'female'
    age      (person) int64 73 20 64
  * city     (person) <U6 'Sau' 'Sau' 'Reus'
Dimensions without coordinates: draw, person, time

Vectorized and outer indexing#

NumPy indexing is extremely powerful, especially what is often called “fancy” indexing. However, it can be confusing and inconsistent (see this NEP proposal for more background). With dimensions and coordinates, xarray allows being explicit and consistent when choosing between vectorized and outer indexing.

Diagram showing visually the two indexing modes. With outer indexing, we select the intersection between the slice in each dimension. With vectorized indexing we select only the specific positions indicated by the pointwise combination of indexes.

Outer indexing selects rectangular or hyper-rectangular slices. To use outer indexing with xarray, we can use lists, arrays or DataArrays that have as a dimension the one that is being indexed. Therefore, the output of outer indexing will have the same dimensions as the input minus those dimensions whose slice is a scalar, which are squeezed.

da.sel(chain=[1, 3], time=slice(0, 5))
<xarray.DataArray (chain: 2, draw: 100, person: 7, time: 5)>
-2.333 -0.9839 0.8063 1.505 -0.0909 ... -0.2397 0.4177 0.9724 1.131 0.3541
Coordinates:
  * chain    (chain) int64 1 3
  * gender   (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female'
    age      (person) int64 25 30 73 47 51 20 64
  * city     (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus'
Dimensions without coordinates: draw, person, time

Vectorized indexing selects only the exact positions being indicated by the indexing arrays. To do that we have to use DataArrays with dimensions different than those being indexed. As long as all index DataArrays have the same dimensions (and thus shape) they can also be multidimensional. Therefore, the output of vectorized indexing will have as dimensions, all the dimensions in the indexes, and all the dimensions that were not indexed:

chain_idx = xr.DataArray([1, 1, 3, 3, 1, 3, 1, 1], dims="vectorized_sel")
time_idx = xr.DataArray([0, 1, 0, 1, 2, 2, 3, 4], dims="vectorized_sel")
da.sel(chain=chain_idx, time=time_idx)
<xarray.DataArray (vectorized_sel: 8, draw: 100, person: 7)>
-2.333 -0.9922 -0.2618 0.7559 1.352 ... 0.4422 1.155 0.4988 -0.4919 -0.3735
Coordinates:
    chain    (vectorized_sel) int64 1 1 3 3 1 3 1 1
  * gender   (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female'
    age      (person) int64 25 30 73 47 51 20 64
  * city     (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus'
Dimensions without coordinates: vectorized_sel, draw, person

Plotting#

Xarray also has some plotting support, which is very convenient for visualizing the multidimensional data. For example, we can facet over a 1st dimension (here chain, use a 2nd dimension (here time) as x values of the line plot and map the color of the lines to a 3rd dimension (here person). The y values are the actual values in the DataArray.

da.mean("draw").plot.line(x="time", hue="person", col="chain", col_wrap=2);
../../../_images/dbb73757b9fdb32d39d07063e801bede51115d5fc81636b9d1e9ec06322e46fe.png

or, we can visualize a 2d slice of the data, with the values mapped to the color via a colormap using imshow.

# make the plot more interesting
da_plot = da_dims + da_persons - xr.DataArray(2*np.sin(np.linspace(0, np.pi, 24)), dims="time")
da_plot.mean(("chain", "draw")).plot.imshow(x="time", y="person", center=False);
../../../_images/ec58d602581c1ecdfa9a2e45f084bf62a8d5caddef27b4238b0681155cb90aa0.png

Dataset#

Datasets are a collection of DataArrays, each DataArray being a data variable and having a name/key. Thus, they are similar to a dictionary of arrays but with one important difference. Dimensions with the same name are shared between variables:

ds = xr.Dataset(
    {
        "a": (["chain", "draw", "person", "time"], ary),
        "b": (["chain", "draw"], np.random.default_rng().normal(size=(4, 100)))
    },
    coords={"chain": [1, 2, 3, 4]}
)
ds
<xarray.Dataset>
Dimensions:  (chain: 4, draw: 100, person: 7, time: 24)
Coordinates:
  * chain    (chain) int64 1 2 3 4
Dimensions without coordinates: draw, person, time
Data variables:
    a        (chain, draw, person, time) float64 -2.333 -0.9839 ... 1.198
    b        (chain, draw) float64 -0.9367 0.5736 -0.3545 ... -0.2987 -0.5123

They work very similarly to DataArrays, with operations being applied to all data variables in the Dataset (if reasonable). For example, if we index along the chain dimension, the operation applies to all two variables. However, taking the mean over the person dimension only affects the data variable a:

ds.sel(chain=2).mean("person")
<xarray.Dataset>
Dimensions:  (draw: 100, time: 24)
Coordinates:
    chain    int64 2
Dimensions without coordinates: draw, time
Data variables:
    a        (draw, time) float64 -0.1994 -0.6486 -0.1148 ... 0.4029 0.2966
    b        (draw) float64 -0.4077 2.132 2.622 -2.868 ... -1.39 0.2419 1.659

To apply arbitrary functions to all the data variables in a Dataset, we can use xarray.Dataset.map for functions that return a DataArray or xarray.apply_ufunc for functions that return arrays. apply_ufunc can be challenging to use, but there is a good chance that a library exists that already wraps and extends numpy/scipy/… functions to have an xarray-like API. Take a look at Xarray related projects to browse packages that extend xarray.

And if this still doesn’t work, yet again, Datasets are quite like a dictionary. So much so that they even have a dict-like interface with .items(), .keys(), .get()

I hope this introduction has been useful. I would like to continue exploring features of xarray as a series, also covering potential new features and extensions for ArviZ. You can leave your comments to this blog post below, only a GitHub account is needed for commenting.


Package versions used to generate this post:

%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Thu Jul 13 2023

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 8.14.0

xarray: 2023.6.0
numpy : 1.24.4

Watermark: 2.4.3

Lastly, as an appendix, you can expand the dropdown below to see the code used to generate the vectorized vs outer index diagram.

Hide code cell content
import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(1, 2, figsize=(8, 4))

ax = axes[0]
ax.fill_betweenx([3, 5], x1=5, x2=0, color="xkcd:jade", alpha=.3)
ax.fill_between([1, 2], y1=5, y2=0, color="violet", alpha=.3)
ax.fill_between([3, 4], y1=5, y2=0, color="violet", alpha=.3)
ax.fill_between([1, 2], y1=5, y2=3, color="r")
ax.fill_between([3, 4], y1=5, y2=3, color="r")
ax.hlines(np.arange(6), 0, 5, lw=1, color="k")
ax.vlines(np.arange(6), 0, 5, lw=1, color="k")
ax.set_ylim(-0.05, 5.05)
ax.set_xlim(-0.05, 5.05)
ax.set_title("Outer indexing: [[0, 1], [1, 3]]")
ax.set_axis_off()
ax.set_aspect("equal")

ax = axes[1]
ax.fill_between([1, 2], y1=5, y2=4, color="r")
ax.fill_between([3, 4], y1=4, y2=3, color="r")
ax.hlines(np.arange(6), 0, 5, lw=1, color="k")
ax.vlines(np.arange(6), 0, 5, lw=1, color="k")
ax.set_ylim(-0.05, 5.05)
ax.set_xlim(-0.05, 5.05)
ax.set_title("Vectorized indexing: [[0, 1], [1, 3]]")
ax.set_axis_off()
ax.set_aspect("equal")

#fig.savefig("indexing_modes.png", dpi=300)
../../../_images/2cdb90fceecf84f477fc4b9176862d59819e7c11b799c06e998721cd74ea0217.png