An introduction to xarray#
Xarray is a Python library for N-D labeled arrays and datasets. It provides two data structures: xarray.DataArray
and xarray.Dataset
. Both support named dimensions, coordinates and, more recently, also explicit indexes.
A DataArray
can be seen as an extension of an n-dimensional array like a numpy.ndarray
. A DataSet
can be seen as an extension of a flat dictionary of arrays.
In this post we will cover dimensions, coordinates and indexes, especially for DataArray
objects but also for Dataset
objects. Xarray also supports attaching arbitrary metadata to DataArray
s, Dataset
s and even coordinates. The arrays we’ll generate will have random values, so we won’t care much about the values themselves, and we won’t cover attributes in this post either, so we will configure the html view of xarray objects to show indexes by default and hide attributes:
import numpy as np
import xarray as xr
xr.set_options(display_expand_data=False, display_expand_indexes=True, display_expand_attrs=False);
DataArray#
Dimensions#
For me, the most important feature is the ability to label dimensions. Given an array, we can assign a label to each of its dimensions (axes in NumPy terminology):
ary = np.random.default_rng().normal(size=(4, 100, 7, 24))
da_dims = xr.DataArray(ary, dims=["chain", "draw", "person", "time"])
da_dims
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)> 1.164 0.8265 -0.3381 0.9449 1.98 ... -0.8126 0.5171 -1.051 -0.5774 -0.4817 Dimensions without coordinates: chain, draw, person, time
We now have the same data as in the array ary
with its dimensions labeled. That means we can indicate operations over specific dimensions using their names instead of positional indicators. For example:
da_dims.mean(dim=("chain", "draw"))
# instead of ary.mean(axis=(0, 1))
<xarray.DataArray (person: 7, time: 24)> -0.04793 -0.01097 0.03508 -0.07406 ... 0.08875 0.02561 -0.01881 -0.03173 Dimensions without coordinates: person, time
And even more, we can also sort along a specific dimension without needing to even specify any dimension nor axis. For example, to sort along the time dimension using the means of the DataArray
we can do:
time_means = da_dims.mean(("chain", "draw", "person"))
da_dims.sortby(time_means)
# instead of
# order = np.argsort(ary.mean(axis=(0, 1, 2)))
# ary[..., order]
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)> 0.6738 0.9679 0.9449 -0.3381 0.5381 1.98 ... 0.3212 0.7235 0.2429 0.3788 -0.4113 Dimensions without coordinates: chain, draw, person, time
As the time_means
DataArray
has a single dimension and it is named time
, .sortby
sorts along that dimension.
Having this information also allows xarray to perform automatic broadcasting:
da_persons = xr.DataArray(np.arange(7), dims="person")
da_dims + da_persons
# instead of ary + np.arange(13)[..., :, None]
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)> 1.164 0.8265 -0.3381 0.9449 1.98 -1.832 ... 6.141 5.187 6.517 4.949 5.423 5.518 Dimensions without coordinates: chain, draw, person, time
Still, for the most part, DataArray
s are interchangeable with ndarrays, even for functions that don’t support DataArray
s explicitly:
np.fft.rfft(da_dims, n=8, axis=2)
Show code cell output
array([[[[-7.05103938e-01+0.j , 3.20998619e+00+0.j ,
-3.89301708e-01+0.j , ...,
-3.28223564e+00+0.j , -3.93905370e+00+0.j ,
-1.46815152e+00+0.j ],
[ 2.63681065e+00-0.63377221j, -7.24334467e-01-1.10928501j,
7.38184144e-01-1.1649908j , ...,
-4.85216041e-01+2.50127064j, -5.36173630e-01-3.22612384j,
-1.80993821e+00+2.778141j ],
[ 8.05211921e-01-1.01508558j, -8.76457636e-01+3.17940903j,
2.70102889e+00-2.01384215j, ...,
1.41752605e+00-0.38687799j, 8.02449687e-01+1.61608384j,
1.53501722e+00-0.39785814j],
[ 3.24115984e+00+0.80562246j, 1.73609377e+00-0.98200847j,
-1.95460231e+00-2.14114662j, ...,
5.10215533e-01+1.0456462j , -2.93548902e+00+1.40644197j,
2.56769326e-01+0.70846213j],
[-3.34692097e+00+0.j , 3.13169766e+00+0.j ,
-5.28439527e+00+0.j , ...,
-4.47646548e-01+0.j , 1.12839572e-01+0.j ,
-2.55832185e-01+0.j ]],
[[ 5.66573148e+00+0.j , -4.48005722e+00+0.j ,
-8.21879415e-01+0.j , ...,
-2.05134761e+00+0.j , 5.17183494e-02+0.j ,
1.39261826e+00+0.j ],
[-7.16916338e-01-1.20434428j, 1.80672069e+00-3.6726229j ,
1.51296495e+00-0.06577871j, ...,
-5.66719875e-01+1.94383976j, 8.46068138e-01-1.78544408j,
-1.36209121e+00-0.02484927j],
[ 2.17454028e+00-0.61967507j, -5.39870954e+00+1.1943909j ,
-1.29473302e+00+0.96514436j, ...,
-3.89581683e+00-0.14737827j, -1.02544549e+00-2.40059281j,
-8.63131064e-01+2.5933031j ],
[-1.01860371e+00-2.42659377j, 1.21909286e-01+2.61289013j,
2.46808073e+00-0.75024111j, ...,
3.15667028e+00+1.99695857j, -3.77012991e-02+1.48175172j,
2.18597104e+00-2.50189672j],
[-2.81890129e-01+0.j , -1.20728074e+00+0.j ,
-1.39635469e+00+0.j , ...,
-2.03864256e+00+0.j , -3.92911440e+00+0.j ,
-2.01185175e+00+0.j ]],
[[ 2.51399507e+00+0.j , 1.15639939e-01+0.j ,
3.25064110e+00+0.j , ...,
-1.61948418e+00+0.j , 3.99291630e+00+0.j ,
3.56264240e+00+0.j ],
[-3.49013063e+00+1.98882299j, -3.75468264e+00+1.80937208j,
8.44602457e-01-2.85380207j, ...,
3.19968173e-02-1.88853035j, -9.02894012e-01+3.67643011j,
3.97043413e-01-2.06830469j],
[ 1.98275445e+00-1.35923036j, -1.78874493e+00-0.89932934j,
-6.43004380e-01+2.94572737j, ...,
-4.88973711e+00+2.37730413j, -1.75016393e+00-1.47954398j,
-1.50567621e+00+1.99763831j],
[ 2.92598474e+00+3.16373188j, -1.61770013e+00+2.48450334j,
4.31787489e+00+0.708441j , ...,
-9.91833427e-01+0.37827845j, 1.31039676e-01+0.5803137j ,
3.41003829e+00-2.61481766j],
[-1.99141005e+00+0.j , 1.36728847e+00+0.j ,
1.19614814e+00+0.j , ...,
2.44722000e+00+0.j , 5.59157012e+00+0.j ,
-3.32603435e+00+0.j ]],
...,
[[ 2.01348494e+00+0.j , -2.20622442e+00+0.j ,
5.29791638e-01+0.j , ...,
2.60188735e+00+0.j , 3.94036915e+00+0.j ,
3.67976569e+00+0.j ],
[-2.47940752e+00-4.45026843j, 1.97395451e+00+1.3991405j ,
1.45276533e+00-1.94389097j, ...,
-5.63472438e-01+2.70736955j, -4.85553272e-01-0.28991876j,
4.66758980e-01-1.21600193j],
[ 5.42325255e+00+1.79414182j, -8.73129047e-01+1.11639694j,
-3.34631372e+00+2.43770924j, ...,
-2.37260338e+00-4.16621109j, -1.17376381e+00+2.6643399j ,
-1.54241215e+00+1.49163253j],
[-1.98113039e+00+0.89670968j, -8.67653768e-01-1.98162734j,
-3.57729650e-01-1.25520421j, ...,
2.73397054e+00+4.6821489j , 2.49075364e+00-0.92810848j,
2.59790646e+00-1.89278806j],
[-1.28392685e-01+0.j , 3.22144964e+00+0.j ,
3.44143288e+00+0.j , ...,
5.65338055e-02+0.j , 3.33736471e+00+0.j ,
-7.47359895e-01+0.j ]],
[[ 4.23665579e+00+0.j , -2.65236758e+00+0.j ,
9.37613460e-01+0.j , ...,
1.32134137e+00+0.j , 6.35751579e-01+0.j ,
-2.67178268e+00+0.j ],
[ 2.09173672e-01-1.37108788j, -8.98548792e-01+1.97633976j,
-1.58035448e+00-0.12683061j, ...,
1.09220481e-02-1.04875162j, 4.97480488e-01-2.65297667j,
-1.43899410e+00-1.01898976j],
[-6.02131956e+00-0.4567748j , 3.26789540e+00+4.0548669j ,
-3.04819563e+00-0.75516924j, ...,
-1.52691048e+00-1.61243248j, 1.47473688e+00-0.63533477j,
-3.27159535e+00+1.21886529j],
[ 1.46130473e+00-1.5653558j , 2.57174837e+00-2.14743628j,
1.47837807e+00+3.44013995j, ...,
1.55066765e+00+3.80429288j, -2.28165101e+00-0.67872384j,
2.83146034e-01+0.36474936j],
[-2.60043301e+00+0.j , 3.07657886e-01+0.j ,
-2.12699062e-01+0.j , ...,
-1.84128879e-01+0.j , -1.41636856e+00+0.j ,
-3.59476168e+00+0.j ]],
[[-7.54465639e-01+0.j , 2.44800792e+00+0.j ,
1.23824871e+00+0.j , ...,
6.45572839e-01+0.j , 2.78595440e+00+0.j ,
-3.90408983e+00+0.j ],
[-3.99517572e+00-0.41487174j, -1.53081442e+00+0.97270356j,
-8.57349969e-01+3.058407j , ...,
-1.79751413e-01-0.18103217j, 4.03335725e+00+1.54764782j,
-4.94662107e+00+0.59763042j],
[ 1.30046332e+00+1.17357156j, 1.43680918e+00-0.86856531j,
-3.37811567e+00-2.80375313j, ...,
-2.90297016e+00+0.3492766j , 1.33835308e+00-4.89516219j,
-8.57280437e-01-1.76495998j],
[-4.72420211e-01+1.62592437j, -5.28192950e-01-3.7311135j ,
2.04752708e+00-0.52719071j, ...,
-3.74159597e-02+1.89539726j, 8.34527562e-01-1.75039062j,
-1.23226832e+00+5.09443148j],
[-1.67656407e+00+0.j , -4.60802433e+00+0.j ,
-4.89769366e+00+0.j , ...,
3.56731167e+00+0.j , -2.76726326e+00+0.j ,
-4.63710666e+00+0.j ]]],
[[[ 2.28204678e-01+0.j , 1.11674938e+00+0.j ,
2.46588699e+00+0.j , ...,
-1.30904880e+00+0.j , -3.29323454e-01+0.j ,
-2.52330358e-01+0.j ],
[-5.72940044e-01-2.15715897j, 2.33859353e-02+0.53923833j,
2.65262459e+00-2.34812157j, ...,
1.90622130e+00+2.31217401j, -8.89856682e-01+1.57348599j,
2.38970194e+00-1.44698256j],
[ 2.69509902e+00+0.79867375j, -2.26251044e-01-1.74049658j,
-6.75353458e-01+0.9305965j , ...,
-1.29540422e+00-0.31814278j, 7.04175238e-01-0.08804204j,
-2.52655723e+00+0.23264895j],
[-1.08805671e+00+2.09746512j, 2.07432040e+00+1.55859216j,
1.94490257e+00-1.90247076j, ...,
1.08594704e+00-1.45900698j, -1.03026229e+00-2.72010171j,
3.66568153e-01-1.18368025j],
[ 2.46961728e+00+0.j , -2.29792278e+00+0.j ,
-6.83295234e-01+0.j , ...,
4.21268994e-01+0.j , -1.92840351e+00+0.j ,
-6.46208236e-01+0.j ]],
[[-3.66431072e+00+0.j , -2.02987973e+00+0.j ,
2.29165853e+00+0.j , ...,
-6.62086608e-01+0.j , -3.84130261e+00+0.j ,
3.15990141e+00+0.j ],
[ 5.00362127e-01+0.1742231j , 1.33881522e+00+0.8292724j ,
1.53086653e+00-0.24502918j, ...,
-2.28635230e+00+0.17978627j, 8.82075167e-01+0.02299693j,
-9.60243012e-01+0.1350562j ],
[ 1.14406051e+00-1.87259856j, 3.33013886e+00-1.8253407j ,
-6.82416852e-01-2.40881928j, ...,
-2.04979299e+00-0.96685655j, 1.31523746e-01+1.60348357j,
-1.39725362e+00-0.1318368j ],
[-2.05478914e+00+0.0551367j , 7.52057736e-01+1.09564391j,
-8.35920915e-01+1.21113105j, ...,
-7.32866449e-01+2.43901478j, -1.49761313e+00+0.0209441j ,
9.68768943e-01+0.21837365j],
[-3.47161453e+00+0.j , -2.12851799e+00+0.j ,
2.18743725e+00+0.j , ...,
-1.08921650e+00+0.j , 2.79319398e+00+0.j ,
6.68018231e-01+0.j ]],
[[-3.77309323e+00+0.j , 3.06218702e-02+0.j ,
-1.44692329e+00+0.j , ...,
-1.01926447e+00+0.j , 2.51550089e+00+0.j ,
1.41240979e-02+0.j ],
[-1.69586563e+00+0.12540939j, -5.42622538e-01+1.64883381j,
1.83277609e+00-1.37020427j, ...,
2.24863679e+00+2.6472236j , 2.54635851e+00-1.36193883j,
1.44173463e+00+3.21908311j],
[ 3.39224620e+00+2.14088255j, 1.91198812e+00+0.03542532j,
-2.09758404e+00+1.24206381j, ...,
-1.84267150e+00+0.18175414j, 1.16640875e+00+0.3001499j ,
9.42867744e-01-0.59925236j],
[-2.67023750e+00+1.39841573j, 2.31197623e+00+2.15312372j,
1.53180279e+00+1.488797j , ...,
2.19828410e+00+0.21382725j, 2.16197251e+00-0.43588995j,
8.73849005e-02-3.17337336j],
[ 4.04165594e+00+0.j , 1.44124013e+00+0.j ,
1.63056029e+00+0.j , ...,
3.46159019e+00+0.j , 1.11689073e+00+0.j ,
7.95602473e-01+0.j ]],
...,
[[-3.25219924e-01+0.j , 1.15281704e+00+0.j ,
-1.00005696e+00+0.j , ...,
-2.45602557e+00+0.j , -2.45589128e+00+0.j ,
-5.95186197e+00+0.j ],
[ 2.75650417e+00+4.34844891j, -2.84410871e-01-2.21934021j,
2.16114412e-01+0.12821621j, ...,
-5.33387280e-01+1.13500373j, -1.11994819e-01-1.6038346j ,
-2.69257462e-01-2.58303659j],
[ 3.02516178e+00-2.00809865j, -7.97019496e-01+0.40081434j,
4.06916135e+00+0.75749161j, ...,
4.30699984e+00-2.39786145j, -4.50702108e+00+2.344143j ,
-4.33149648e-01-0.22856827j],
[ 2.85962199e+00+1.26857576j, -1.15534364e+00-1.97130098j,
3.57679225e-01+1.5715332j , ...,
-2.18161245e+00-0.41854799j, 4.67748685e-01-0.36095919j,
1.48726992e+00+4.2001731j ],
[ 3.45642462e+00+0.j , -2.74033093e+00+0.j ,
2.71853205e+00+0.j , ...,
-3.90758467e+00+0.j , -1.36612443e+00+0.j ,
-6.60612698e+00+0.j ]],
[[-1.53959136e+00+0.j , -6.83530523e-01+0.j ,
8.74051832e-01+0.j , ...,
6.02012121e-01+0.j , -4.40372516e-01+0.j ,
1.88884500e+00+0.j ],
[-2.82370484e+00-1.49481788j, 5.24174032e-01-0.28651448j,
3.23653095e-01+0.80643245j, ...,
-2.43698033e+00+2.68367868j, 9.90181614e-01-0.80332514j,
2.84138994e-01+0.48295765j],
[ 2.83801537e+00+3.33246593j, 1.34997375e+00+0.91753727j,
3.85316653e-01-2.18354201j, ...,
-2.80674904e-01-2.02370397j, -2.22210014e+00+1.84978599j,
-3.48682794e+00-2.54347343j],
[ 1.19953928e+00+2.0229844j , -1.46138842e-01+1.6628312j ,
-3.12328811e+00-1.71518363j, ...,
-7.26152456e-02+3.16914158j, 1.48744561e+00-0.68485705j,
7.69205595e-01+1.14549334j],
[ 1.82554354e-01+0.j , 4.04591533e+00+0.j ,
9.65105585e-02+0.j , ...,
1.48802480e+00+0.j , 4.51354547e-01+0.j ,
-1.58111216e+00+0.j ]],
[[-6.24508989e+00+0.j , 9.95642445e-01+0.j ,
8.46532785e-01+0.j , ...,
-3.53547249e+00+0.j , -6.09947233e+00+0.j ,
1.22691621e+00+0.j ],
[ 2.25526660e+00-3.04622232j, 1.28048067e+00+1.3135754j ,
1.16760518e+00-1.86728659j, ...,
-2.16805978e+00+1.38167277j, 1.02223561e+00+0.81796474j,
1.73649888e+00-1.53478504j],
[-3.14632837e+00+1.08072736j, 3.36498665e+00-1.47388002j,
4.25904924e-01-1.5966948j , ...,
1.10470177e+00+2.3891663j , -8.00473121e-02+3.3053393j ,
2.34240572e+00-2.77235127j],
[-5.09773731e+00-4.59737366j, 6.12069778e-01-0.55945174j,
-1.30911569e+00+3.24875392j, ...,
-3.47870357e-01+1.83146367j, -6.72106462e-01-1.81544252j,
-2.33728038e-01+0.67504838j],
[-4.49459440e+00+0.j , 5.96500644e-02+0.j ,
3.10944526e+00+0.j , ...,
3.21278467e+00+0.j , 1.49672121e+00+0.j ,
-2.74731593e+00+0.j ]]],
[[[-1.11417247e+00+0.j , -4.10612132e+00+0.j ,
7.38182147e-01+0.j , ...,
-1.96584242e+00+0.j , -1.41373442e-01+0.j ,
3.66325857e+00+0.j ],
[ 1.62504825e+00+0.52325899j, 1.57417191e+00-1.30372055j,
1.81857793e-01+3.09071349j, ...,
-3.43721054e-01-0.2250892j , -6.49076023e-02+3.31699315j,
-2.67694347e+00-2.57490306j],
[-7.44116790e-01-0.66342903j, -3.57481697e+00-1.09506426j,
1.63946566e+00-1.6762944j , ...,
-2.41747128e+00+1.65949022j, -8.73687787e-01+0.44551125j,
-1.56307291e+00+0.33081337j],
[ 4.04396350e+00+1.16979538j, -2.62078494e+00+0.09506434j,
1.32607048e+00-0.21776989j, ...,
8.40240719e-01-0.66112353j, -1.25218810e+00-1.8812777j ,
9.00493681e-01+1.59250907j],
[-3.46755282e+00+0.j , -2.07298298e+00+0.j ,
-1.69611943e-01+0.j , ...,
-1.57453084e+00+0.j , 4.45912550e+00+0.j ,
-2.12368973e+00+0.j ]],
[[-5.38236848e-01+0.j , 1.23907586e+00+0.j ,
-4.94175185e+00+0.j , ...,
3.26300793e+00+0.j , -4.99721025e+00+0.j ,
-1.92499891e+00+0.j ],
[-1.71698438e+00-0.9832957j , 5.05572657e-01-0.97231324j,
2.45246437e+00+0.43630134j, ...,
2.69097737e-03-0.81763107j, -1.82600630e-01-1.00196128j,
1.11875960e-01-0.06010646j],
[-1.58664540e+00+2.87822373j, 1.60346044e-01+1.49673479j,
6.47665054e-01-0.87150724j, ...,
-9.94638295e-01+1.91404525j, -1.07760710e+00+0.81179684j,
-1.97279911e+00-0.12028356j],
[ 2.02493036e+00+1.47195548j, 1.40155198e+00-2.77025095j,
-1.13136804e+00+1.36239382j, ...,
3.36889687e+00+0.77830432j, -1.94161924e+00+1.29163109j,
-3.47540648e-01+1.44734244j],
[ 6.17413331e-01+0.j , -2.32734568e+00+0.j ,
9.27277138e-01+0.j , ...,
2.27494798e+00+0.j , -4.76333624e-01+0.j ,
4.45995101e-01+0.j ]],
[[-1.93536171e+00+0.j , -1.12048053e+00+0.j ,
-1.40307940e+00+0.j , ...,
2.91319667e+00+0.j , -3.81467648e+00+0.j ,
-2.99359077e+00+0.j ],
[-9.84439659e-01-0.86548504j, 4.31521904e-01-0.29210012j,
2.72245934e+00-0.48066832j, ...,
-7.83255712e-01+1.66557625j, 3.63592118e-01-0.19929632j,
-1.98389394e-01-1.87473382j],
[-3.35809533e+00+3.59322445j, -1.04247352e+00-1.65201738j,
1.60060192e+00-0.07651537j, ...,
-5.70593776e-01-3.49224714j, -1.12104769e+00+0.52028717j,
-4.24799549e-01+0.43163151j],
[-1.35159016e-01-0.84014089j, 6.53208973e-01+1.28570034j,
1.19427577e+00-1.46061826j, ...,
-5.61820923e-01+1.40509803j, -4.12291550e-02+2.76109374j,
8.98196832e-02+1.74563406j],
[ 1.63790366e+00+0.j , -3.33286523e+00+0.j ,
-2.14032601e+00+0.j , ...,
-4.18644782e-02+0.j , 1.42133023e+00+0.j ,
-2.72049156e+00+0.j ]],
...,
[[-1.73599543e+00+0.j , 2.60960690e+00+0.j ,
-2.99696199e+00+0.j , ...,
-1.32571694e+00+0.j , 2.73676566e+00+0.j ,
-2.78811210e+00+0.j ],
[ 2.62743163e-01+0.02288145j, -1.51735414e+00-1.49657654j,
4.00961369e-01+0.7357825j , ...,
2.75936875e+00-3.17751879j, -2.19226395e+00-0.82740782j,
2.28718986e+00-2.16921752j],
[ 4.01222409e-01+2.69050905j, -2.64290413e-01+0.09438691j,
2.94184606e+00+0.48935307j, ...,
4.26631715e+00+0.67557341j, 2.20586826e+00+2.15985577j,
8.71695404e-02-1.44261576j],
[-1.08166223e+00-1.25446266j, -1.85059581e+00-0.39306674j,
4.39484727e-01-1.38423787j, ...,
7.01226329e-01-0.47204366j, 1.18750279e+00-1.68228536j,
-9.25776617e-02+0.80480425j],
[ 3.80458016e+00+0.j , 5.97296443e-01+0.j ,
-2.98979049e+00+0.j , ...,
-2.22517791e+00+0.j , -1.27247691e+00+0.j ,
-4.23741588e+00+0.j ]],
[[-1.30273703e+00+0.j , 6.70224324e+00+0.j ,
-1.22420027e+00+0.j , ...,
-3.11208674e+00+0.j , -1.61035094e-01+0.j ,
-3.18058220e+00+0.j ],
[-1.65419280e+00+3.10743869j, -2.65163773e+00-1.74649361j,
-1.39604591e-01+1.2750009j , ...,
3.33452959e+00+1.84637499j, -9.03588717e-01+1.59929988j,
-3.76342958e+00-0.01789431j],
[-2.32463716e-01+0.85640203j, -1.94565096e+00-0.30572879j,
-1.93658225e+00-0.04892631j, ...,
6.66449919e-01-3.02348439j, 1.18429538e+00+0.78614788j,
1.39083415e+00+2.76625762j],
[-1.11907938e+00-2.77668762j, -2.46165315e+00-1.88939767j,
-1.37926178e-01+2.48066586j, ...,
3.74135435e+00+3.95512835j, 1.40886357e+00-1.15931943j,
-6.23807099e-01-0.140642j ],
[ 1.21055022e-01+0.j , 6.80180126e-01+0.j ,
3.98688833e+00+0.j , ...,
-1.52982789e+00+0.j , -1.23681439e+00+0.j ,
-2.31236785e+00+0.j ]],
[[-1.94847959e+00+0.j , 7.23953570e-01+0.j ,
1.05056797e+00+0.j , ...,
2.12592562e-01+0.j , 2.08718056e+00+0.j ,
4.40433742e+00+0.j ],
[-2.83386464e+00-1.69815119j, 1.41287128e+00-3.21385207j,
1.23039940e+00-0.45492841j, ...,
-1.05171241e+00+2.75493644j, -1.52513516e+00-0.1358442j ,
-3.70131651e+00-1.55291823j],
[ 1.66612659e+00+3.38642669j, -6.92990613e-01+0.50371798j,
5.44211650e-02-1.73528884j, ...,
2.86559312e-01+0.22955612j, -1.65756488e+00+2.06391072j,
1.57312921e+00+2.27230483j],
[-2.22332194e+00-0.93856846j, -2.44687507e-01+1.58500899j,
-9.05686298e-01-2.10015023j, ...,
-1.44726370e+00+0.92337773j, 1.74274870e+00+1.13492451j,
2.00849323e+00+1.42641698j],
[ 2.32051325e-01+0.j , 1.77199968e+00+0.j ,
-3.01255522e+00+0.j , ...,
6.43302060e+00+0.j , 3.00642925e+00+0.j ,
6.95156922e-01+0.j ]]],
[[[-4.97411372e+00+0.j , -2.45689218e+00+0.j ,
-2.71221773e+00+0.j , ...,
2.78051456e-01+0.j , -2.77032881e+00+0.j ,
2.69523067e+00+0.j ],
[-1.22977586e+00-0.39449618j, -1.83996004e+00-0.1701448j ,
-3.96717423e-01+0.95994278j, ...,
2.62786479e-01+0.80223022j, -1.02980411e+00+2.79534239j,
-2.40817225e+00+0.64815626j],
[ 1.71996214e+00+2.7154812j , -3.60359846e+00+1.65773781j,
-2.41821712e+00-0.366324j , ...,
2.34884171e+00+1.52841101j, 1.05284416e+00-1.03362442j,
-4.55198125e+00-1.03820941j],
[ 2.22689194e+00-0.31893238j, 7.76251550e-02+0.27939908j,
-2.09334344e+00-1.0036685j , ...,
1.60623310e+00-2.5271102j , -1.43625083e+00+1.24032444j,
-5.62741373e-01-0.7022142j ],
[-5.44055815e+00+0.j , -1.69878267e+00+0.j ,
-1.10731172e+00+0.j , ...,
-1.00439562e+00+0.j , 1.44451957e+00+0.j ,
-2.06747100e+00+0.j ]],
[[ 3.45698998e-01+0.j , 2.40189763e+00+0.j ,
-2.21156545e+00+0.j , ...,
-3.30523152e+00+0.j , 1.44421942e+00+0.j ,
4.05611898e+00+0.j ],
[-3.85471510e-01+0.55571113j, -2.85815768e+00+2.53059094j,
-1.32347937e+00-2.59035663j, ...,
-5.20936104e-01-1.85907293j, -8.15403328e-01-2.2607936j ,
-1.12697655e+00-1.76888605j],
[ 1.82357371e+00-1.42565086j, -1.56054803e+00-0.38127237j,
1.39999184e+00+1.65089901j, ...,
1.60140159e+00-1.09456774j, -1.28417831e+00+1.37766151j,
3.31051666e-01+0.14520203j],
[ 1.26403797e+00+1.19016218j, -3.08338166e-01+1.06201866j,
-2.21802257e+00+2.11314669j, ...,
-4.37435521e+00+1.36040178j, 7.89651107e-02-0.38215782j,
1.34949835e+00+1.59469772j],
[-2.36932369e+00+0.j , 3.11408079e+00+0.j ,
1.68043091e+00+0.j , ...,
-7.50036929e-01+0.j , -8.02983286e-01+0.j ,
5.97919147e-01+0.j ]],
[[-1.58565994e-01+0.j , 2.06701989e+00+0.j ,
4.15021169e+00+0.j , ...,
5.10810031e+00+0.j , 1.20266573e+00+0.j ,
-5.06000100e-01+0.j ],
[ 3.59540639e+00+1.21483259j, -1.97478228e+00-1.41956875j,
-5.57255872e-01-2.85257107j, ...,
-6.84972827e-01-5.26662712j, 1.58481294e-01+2.71976012j,
-1.13996254e+00-0.88116269j],
[ 2.88872997e-01-1.05996498j, -3.15539495e+00-1.58379064j,
2.61255643e+00+1.40399215j, ...,
-5.77796331e-01+0.46807648j, -2.46941062e+00-3.32257291j,
-4.06327061e+00-0.56798851j],
[ 1.23531644e+00-0.20400421j, 1.29267789e+00+5.06633517j,
2.66445894e+00+2.60583775j, ...,
6.01827505e-01+0.82161735j, 1.69262386e+00+0.84822551j,
7.83155706e-01+1.9052187j ],
[ 2.48870245e+00+0.j , -5.64141357e-01+0.j ,
2.05306959e+00+0.j , ...,
-2.06175030e+00+0.j , -2.56619164e+00+0.j ,
-2.91344313e+00+0.j ]],
...,
[[ 3.66808345e+00+0.j , 1.50013163e+00+0.j ,
2.15654619e+00+0.j , ...,
4.71359064e+00+0.j , 9.46198012e-01+0.j ,
-4.96746125e+00+0.j ],
[ 2.87165468e+00+1.57105162j, 1.78585087e+00-2.36411764j,
3.29910105e+00+0.06054646j, ...,
-6.21946553e-01+0.4218751j , -2.58283559e+00+4.12393829j,
5.17177615e+00-0.47683331j],
[-1.50306833e+00-2.62495158j, -2.19356040e+00-0.60309544j,
3.05915451e+00-1.30647229j, ...,
-1.10668187e+00-0.90273913j, 8.68310835e-02+0.0109907j ,
-8.85340954e-01+0.19800307j],
[-2.14617805e-01-1.667229j , -3.78855413e+00-2.60103718j,
1.82286700e+00-1.96583757j, ...,
1.75138340e+00-0.78184286j, -5.55299480e-01-3.26215835j,
-3.68038256e-01+0.36670947j],
[ 2.64681333e+00+0.j , 1.15555061e+00+0.j ,
-1.06317672e+00+0.j , ...,
-9.57354330e-01+0.j , -6.80448789e-01+0.j ,
3.10728692e+00+0.j ]],
[[ 1.05869105e+00+0.j , 1.00261817e+00+0.j ,
2.83253499e-01+0.j , ...,
1.72161960e-01+0.j , 2.16012149e+00+0.j ,
1.05455221e+00+0.j ],
[ 2.95126771e+00-0.74092866j, 1.50929326e+00+2.25452063j,
1.42345407e+00-0.0864217j , ...,
3.27615525e+00-0.41041367j, 1.34735988e+00-2.29946142j,
-2.54519584e+00-2.27726697j],
[ 1.16440057e-01-1.83852523j, 6.20757627e-01-0.57468291j,
-4.92333915e-01-0.82867988j, ...,
-1.56526838e+00+0.54223215j, 9.00448297e-01-3.47864987j,
-5.81515953e-01+1.64778296j],
[-3.21908416e+00-0.4892671j , 1.71946058e+00-1.690601j ,
1.28156773e+00-0.35825546j, ...,
1.67224787e+00-3.84558561j, 1.58113353e+00+3.32091674j,
-6.82781771e-01+0.21030321j],
[ 4.36807636e+00+0.j , 3.53533657e-01+0.j ,
-1.80231714e+00+0.j , ...,
-2.49399808e+00+0.j , -3.68322814e+00+0.j ,
-1.20686128e+00+0.j ]],
[[-1.95700755e+00+0.j , -1.94054845e+00+0.j ,
2.54031514e+00+0.j , ...,
-1.23040300e+00+0.j , -3.11036597e+00+0.j ,
1.71163266e+00+0.j ],
[ 1.89089364e+00-0.47609814j, 1.07970380e+00-1.12651079j,
1.29649962e+00+2.72149424j, ...,
1.17714599e+00-2.91054927j, -2.19269040e+00-2.22198698j,
-4.77125185e-01+1.94370265j],
[ 5.81698472e-01+0.92267727j, 2.99656846e+00+1.6949603j ,
3.82216161e+00-2.07870969j, ...,
1.06969938e+00+1.40548078j, -1.40231693e+00+2.81088398j,
3.70059829e+00-3.13629137j],
[-2.69063952e+00-1.61879639j, -2.58897970e+00-3.16049067j,
1.99527445e+00+0.80102862j, ...,
-1.40490164e+00+0.57107601j, -2.77516591e+00+0.10698507j,
-7.76277083e-01+1.13126318j],
[ 3.40498508e+00+0.j , 5.74938514e-01+0.j ,
2.37627852e+00+0.j , ...,
1.92361056e+00+0.j , 3.44082375e-01+0.j ,
2.10772976e-01+0.j ]]]])
It is possible to convert a DataArray
to an ndarray using da.values
but as we have just seen, very often it is not necessary.
Coordinates and indexes#
In addition, it is also possible to use coordinates. For DataArray
s, coordinates are kind of in between data and attributes.
A DataArray
is equivalent to a single array, however, it can have as many coordinates as desired. When creating a DataArray
,
the coords
argument can be used to initialize the coordinates. It is a dictionary whose keys are coordinate names,
and its values are DataArray
s, arrays or tuples:
ary = np.random.default_rng().normal(size=(4, 100, 7, 24))
da = xr.DataArray(
ary,
dims=["chain", "draw", "person", "time"],
coords={
"chain": [1, 2, 3, 4],
"gender": ("person", ["male", "neutrois", "female", "male", "NB", "female", "female"]),
"age": ("person", [25, 30, 73, 47, 51, 20, 64]),
}
)
da
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)> -2.333 -0.9839 0.8063 1.505 -0.0909 1.33 ... -0.1766 0.484 0.1027 -0.1825 1.198 Coordinates: * chain (chain) int64 1 2 3 4 gender (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female' age (person) int64 25 30 73 47 51 20 64 Dimensions without coordinates: draw, person, time
Alternatively, xarray.DataArray.assign_coords
can be used to add coordinates to an existing DataArray
:
da = da.assign_coords(city=("person", ["Lleida", "Leida", "Sau", "Lleida", "Reus", "Sau", "Reus"]))
da
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)> -2.333 -0.9839 0.8063 1.505 -0.0909 1.33 ... -0.1766 0.484 0.1027 -0.1825 1.198 Coordinates: * chain (chain) int64 1 2 3 4 gender (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female' age (person) int64 25 30 73 47 51 20 64 city (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus' Dimensions without coordinates: draw, person, time
So far we have added all these coordinates, but we can’t yet use that information directly for, say, slicing. Only the values of chain
can be used for that,
hence it being bold in the “Coordinates” section and appearing in the “Indexes” section. To use the rest of the coordinates for slicing too, we need to
indicate that to xarray:
da = da.set_xindex("gender").set_xindex("city")
da
<xarray.DataArray (chain: 4, draw: 100, person: 7, time: 24)> -2.333 -0.9839 0.8063 1.505 -0.0909 1.33 ... -0.1766 0.484 0.1027 -0.1825 1.198 Coordinates: * chain (chain) int64 1 2 3 4 * gender (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female' age (person) int64 25 30 73 47 51 20 64 * city (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus' Dimensions without coordinates: draw, person, time
We can slice along the person
dimension using the gender
coordinate:
da.sel(gender="female")
<xarray.DataArray (chain: 4, draw: 100, person: 3, time: 24)> -0.2618 -0.6076 -0.4886 -0.6244 -0.1986 ... -0.1766 0.484 0.1027 -0.1825 1.198 Coordinates: * chain (chain) int64 1 2 3 4 * gender (person) <U8 'female' 'female' 'female' age (person) int64 73 20 64 * city (person) <U6 'Sau' 'Sau' 'Reus' Dimensions without coordinates: draw, person, time
Vectorized and outer indexing#
NumPy indexing is extremely powerful, especially what is often called “fancy” indexing. However, it can be confusing and inconsistent (see this NEP proposal for more background). With dimensions and coordinates, xarray allows being explicit and consistent when choosing between vectorized and outer indexing.
Outer indexing selects rectangular or hyper-rectangular slices. To use outer indexing with xarray, we can use lists, arrays or DataArray
s that have as a dimension the one that is being indexed. Therefore, the output of outer indexing will have the same dimensions as the input minus those dimensions whose slice is a scalar, which are squeezed.
da.sel(chain=[1, 3], time=slice(0, 5))
<xarray.DataArray (chain: 2, draw: 100, person: 7, time: 5)> -2.333 -0.9839 0.8063 1.505 -0.0909 ... -0.2397 0.4177 0.9724 1.131 0.3541 Coordinates: * chain (chain) int64 1 3 * gender (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female' age (person) int64 25 30 73 47 51 20 64 * city (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus' Dimensions without coordinates: draw, person, time
Vectorized indexing selects only the exact positions being indicated by the indexing arrays. To do that we have to use DataArray
s with dimensions different than those being indexed. As long as all index DataArray
s have the same dimensions (and thus shape) they can also be multidimensional. Therefore, the output of vectorized indexing will have as dimensions, all the dimensions in the indexes, and all the dimensions that were not indexed:
chain_idx = xr.DataArray([1, 1, 3, 3, 1, 3, 1, 1], dims="vectorized_sel")
time_idx = xr.DataArray([0, 1, 0, 1, 2, 2, 3, 4], dims="vectorized_sel")
da.sel(chain=chain_idx, time=time_idx)
<xarray.DataArray (vectorized_sel: 8, draw: 100, person: 7)> -2.333 -0.9922 -0.2618 0.7559 1.352 ... 0.4422 1.155 0.4988 -0.4919 -0.3735 Coordinates: chain (vectorized_sel) int64 1 1 3 3 1 3 1 1 * gender (person) <U8 'male' 'neutrois' 'female' ... 'NB' 'female' 'female' age (person) int64 25 30 73 47 51 20 64 * city (person) <U6 'Lleida' 'Leida' 'Sau' 'Lleida' 'Reus' 'Sau' 'Reus' Dimensions without coordinates: vectorized_sel, draw, person
Plotting#
Xarray also has some plotting support, which is very convenient for visualizing the multidimensional data. For example, we can facet over a 1st dimension (here chain
,
use a 2nd dimension (here time
) as x values of the line plot and map the color of the lines to a 3rd dimension (here person
).
The y values are the actual values in the DataArray
.
da.mean("draw").plot.line(x="time", hue="person", col="chain", col_wrap=2);
or, we can visualize a 2d slice of the data, with the values mapped to the color via a colormap using imshow
.
# make the plot more interesting
da_plot = da_dims + da_persons - xr.DataArray(2*np.sin(np.linspace(0, np.pi, 24)), dims="time")
da_plot.mean(("chain", "draw")).plot.imshow(x="time", y="person", center=False);
Dataset#
Dataset
s are a collection of DataArray
s, each DataArray
being a data variable and having a name/key. Thus, they are similar to a dictionary of arrays but with one important difference. Dimensions with the same name are shared between variables:
ds = xr.Dataset(
{
"a": (["chain", "draw", "person", "time"], ary),
"b": (["chain", "draw"], np.random.default_rng().normal(size=(4, 100)))
},
coords={"chain": [1, 2, 3, 4]}
)
ds
<xarray.Dataset> Dimensions: (chain: 4, draw: 100, person: 7, time: 24) Coordinates: * chain (chain) int64 1 2 3 4 Dimensions without coordinates: draw, person, time Data variables: a (chain, draw, person, time) float64 -2.333 -0.9839 ... 1.198 b (chain, draw) float64 -0.9367 0.5736 -0.3545 ... -0.2987 -0.5123
They work very similarly to DataArray
s, with operations being applied to all data variables in the Dataset
(if reasonable). For example, if we index along the chain dimension, the operation applies to all two variables. However, taking the mean over the person dimension only affects the data variable a
:
ds.sel(chain=2).mean("person")
<xarray.Dataset> Dimensions: (draw: 100, time: 24) Coordinates: chain int64 2 Dimensions without coordinates: draw, time Data variables: a (draw, time) float64 -0.1994 -0.6486 -0.1148 ... 0.4029 0.2966 b (draw) float64 -0.4077 2.132 2.622 -2.868 ... -1.39 0.2419 1.659
To apply arbitrary functions to all the data variables in a Dataset
, we can use xarray.Dataset.map
for functions that return a DataArray
or xarray.apply_ufunc
for functions that return arrays. apply_ufunc
can be challenging to use, but there is a good chance that a library exists that already wraps and extends numpy/scipy/… functions to have an xarray-like API. Take a look at Xarray related projects to browse packages that extend xarray.
And if this still doesn’t work, yet again, Dataset
s are quite like a dictionary. So much so that they even have a dict-like interface with .items()
, .keys()
, .get()
…
I hope this introduction has been useful. I would like to continue exploring features of xarray as a series, also covering potential new features and extensions for ArviZ. You can leave your comments to this blog post below, only a GitHub account is needed for commenting.
Package versions used to generate this post:
%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Thu Jul 13 2023
Python implementation: CPython
Python version : 3.10.12
IPython version : 8.14.0
xarray: 2023.6.0
numpy : 1.24.4
Watermark: 2.4.3
Lastly, as an appendix, you can expand the dropdown below to see the code used to generate the vectorized vs outer index diagram.
Show code cell content
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(1, 2, figsize=(8, 4))
ax = axes[0]
ax.fill_betweenx([3, 5], x1=5, x2=0, color="xkcd:jade", alpha=.3)
ax.fill_between([1, 2], y1=5, y2=0, color="violet", alpha=.3)
ax.fill_between([3, 4], y1=5, y2=0, color="violet", alpha=.3)
ax.fill_between([1, 2], y1=5, y2=3, color="r")
ax.fill_between([3, 4], y1=5, y2=3, color="r")
ax.hlines(np.arange(6), 0, 5, lw=1, color="k")
ax.vlines(np.arange(6), 0, 5, lw=1, color="k")
ax.set_ylim(-0.05, 5.05)
ax.set_xlim(-0.05, 5.05)
ax.set_title("Outer indexing: [[0, 1], [1, 3]]")
ax.set_axis_off()
ax.set_aspect("equal")
ax = axes[1]
ax.fill_between([1, 2], y1=5, y2=4, color="r")
ax.fill_between([3, 4], y1=4, y2=3, color="r")
ax.hlines(np.arange(6), 0, 5, lw=1, color="k")
ax.vlines(np.arange(6), 0, 5, lw=1, color="k")
ax.set_ylim(-0.05, 5.05)
ax.set_xlim(-0.05, 5.05)
ax.set_title("Vectorized indexing: [[0, 1], [1, 3]]")
ax.set_axis_off()
ax.set_aspect("equal")
#fig.savefig("indexing_modes.png", dpi=300)