Tensors

Last time I got into labelling the coordinates of vectors and covectors with indices in a particular pattern:

  • for a vector coordinate, the index is superscript, such as $a^n$, whereas
  • for a covector coordinate, the index is subscript, such as $a_n$.

These of course describe ordinary numbers (scalars). By saying $a^n$, we mean there is some fixed number of dimensions to our vector space (let’s say $3$) and therefore $n$ can take on the values $1$, $2$ or $3$. So $a^2$ is the ordinary number that is the second coordinate of the vector $\vec{a}$, while $b_3$ is the third coordinate of the covector $\vec{b}$.

But this presupposes that we’ve chosen a set of basis vectors to be scaled by these coordinates. Coordinates are not fundamental. That is, we don’t necessarily want to say that a vector is its coordinates. So we also need a corresponding notation for basis (co)vectors, and we swap the placement of the indices like this:

  • for a basis vector, the index is subscript, such as $\vec{e}_n$, whereas
  • for a basis covector, the index is superscript, such as $\vec{e}^n$.

Why swap them? Because we’re going to follow a simple rule of multiplying things with indices in opposite positions, and this configuration allows us to follow that rule even when scaling basis vectors by coordinates:

\[\vec{a} = \sum_n a^n \vec{e}_n\]

I also talked about a bit of machinery for linearly mapping a vector to a corresponding covector (or the reverse), and how this something else we have to configure for our vector space. The basis vectors and covectors are already paired up by how they are defined, but for anything else there is no natural pairing. We use a matrix to provide the missing information, and there is a significance to how we position the indices used to label its numeric elements:

  • for a matrix that maps from vector to covector coordinates, the indices are subscript: $g_{ij}$, whereas
  • for a matrix that maps from covector to vector coordinates, the indices are superscript: $g^{ij}$.

The arithmetic of doing this kind of mapping, e.g. to get coordinates of the covector corresponding to some vector coordinates, is matrix multiplication:

\[a_m = \sum_n g_{mn} a^n\]

The input vector has its index “up” and the result has its index “down”, from which $g_{mn}$ gets its nickname, the lowering operator, though more officially it’s called a metric. The inverse metric $g^{mn}$ is likewise called the raising operator.

We can also invent linear mappings from vectors to vectors: things like rotation, reflection, skewing, but within the same vector space. They will also be described by a matrix with two indices, but we need to be careful how we place those indices, to be clear about the types of the input and output:

\[a^m = \sum_n M^m_{\ \ \ \ n} a^n\]

See how the two $n$ indices always seem to “cancel out”, being in opposite positions. Using the same summation index variable for two objects whose coordinates are being multiplied is called contraction. It doesn’t matter how many other factors go into the same term. It’s like we’re doing a scaled dot product between one vector and one covector. The matrix is being used like a collection of covectors, each of which produces (via the dot product) one of the coordinates of the resultant vector. The output only has whatever indices remain after the contraction has done its elimination, and those remaining indices will not change position, so we can easily check that our equation makes sense. In the above example, $m$ is the only remaining index, and it stays “up”.

This whole pattern is extremely general and powerful. We can define calculating machines that work on a particular number of vector (or covector) inputs. The matrix $M^m_{\ \ \ \ n}$ is an example that takes a vector input and a produces a vector output. The metric $g_{mn}$ turns a vector into a covector. But also, a covector acting on a vector to produce a scalar is an example of the same pattern: contracting by tying two indices to the same summation variable.

The general name for these calculation machines is tensors. They include vectors and covectors, appropriately configured matrices representing operators, but also in some contexts, much larger collections of numbers that have to be labelled with three or four indices. For example, the Riemann curvature tensor $R^a_{\ \ bcd}$, where all indices have 4 possible values in the case of classical spacetime, consists of $4^4 = 256$ numbers. At the other end of the scale of complexity, it’s perfectly possible for a tensor to have no indices at all; that means it must be simply a scalar.

The ultimate constraint on tensors is that they can be contracted with other tensors (be they vectors, covectors or something more complex) until all we have left is a scalar. The key point is if we change the orientation of our basis vectors, the coordinates in all our tensors will change accordingly, but they will still represent the same thing, and we can be sure this is the case if we contract down to a scalar, because we’ll always get the same scalar from the same system of tensors regardless of the basis we choose. Indeed, the traditional way tensors are defined is as systems of coordinates that transform in a particular way under a change of basis (one often used but unhelpfully glib definition is “a tensor is something that transforms like a tensor.”)

But, just as we saw with vectors, it pays to think about these concepts in more ways than one. A tensor can be thought of as a geometrical object. Okay, it’s not as easy to visualise as an arrow. But just as a vector can be thought of as an arrow, which is a distinct kind of geometric object, not just a collection of numbers (even though it can be described with numbers if you choose a basis for doing so), any tensor is also a geometrical object of a distinct kind, not just a collection of numbers.

Given this, the index notation, which up to now we’ve been interpreting as a way to label collections of numbers, can instead be interpreted as a way to describe abstract machines that operate on geometrical objects - ultimately vectors and covectors - to produce scalars whose values are completely independent of any choice of basis. Some authors call it abstract index notation, others call it slot-naming index notation. (One convention I’ve seen in textbooks uses the Greek alphabet for indices when the notation is to be interpreted as numerical realisations, and the Latin alphabet for abstract notation.)

Another notational point: Einstein noticed that his hand got quite tired from endlessly writing $\sum$ symbols for all the summations. He realised that they were completely unnecessary in this area, because a summation symbol only ever introduced a summation variable that was used in exactly two indices, one up and one down. Therefore the mere appearance of such a pair of linked indices tells you that there’s a summation. For example:

\[\sum_{\mu} \sum_{\nu} \sum_{\beta} \sum_{\lambda} g_{\mu\nu} Z^\mu_{\ \ \beta} a^{\beta} Z^\nu_{\ \ \lambda} b^{\lambda}\]

Four different tensors: the metric $g$, and an operator $Z$ being applied to two vectors ($a$ and $b$). This results in a total of 8 indices. But there are four contractions happening: every index is a summation variable, such that up-down pairs share the same variable. As a result, it’s entirely unnecessary to write the summation symbols. This says it all:

\[g_{\mu\nu} Z^\mu_{\ \ \beta} a^{\beta} Z^\nu_{\ \ \lambda} b^{\lambda}\]

In examples where we want the output to be something more than a scalar, i.e. a tensor with at least one index, the equation will still be perfectly unambiguous without any explicit summation symbols, because there will be some left-over indices that have no oppositely-positioned partner and therefore are not being summed over. They survive in the output. With this abbreviation, a square matrix multiplying with a column matrix (or, interpreted abstractly, an operator acting on a vector) is simply written:

\[M^m_{\ \ \ \ n} a^n\]

Abstract index notation takes care of many notational duties that are sometimes performed in other ways. We say of $g_{mn}$ that it’s a type $(0, 2)$-tensor, the two numbers counting the up and down indices, but the abstract index notation already conveys this fact. Likewise the tensor product symbol $\otimes$ can be used to convey the notion of joining the covector space $V^*$ to itself:

\[V^* \otimes V^*\]

the result of which is a space of $(0, 2)$-tensors, but again, this conveys nothing that hasn’t already been made clear by saying $g_{mn}$.

The tensor product notation is still used, however. In the above example, it forms a tensor space from two “copies” of the same covector space, and is therefore abstractly describing a space of possible tensor objects, without saying how we might define those objects in more detail. To go beyond that, we can take the tensor product of two specific vectors. For example, we could make a tensor for every way of choosing a pair of basis vectors:

\[\vec{e}_n \otimes \vec{e}_m\]

Supposing our vector space is 3-dimensional, this would produce 9 different tensors.

To get even more practical, assume the vectors are just columns of three numbers, $\mathbb{R}^3$, and our chosen basis vectors are the usual one-hot vectors (the “standard basis”). There are 9 possible pairs of basis vectors, and in each of those combinations there are 9 pairings of their coordinates. We perform ordinary multiplication between all of these pairings. On the face of it this produces 81 numbers, but it’s better to think of it as a sort of $3 \times 3$ matrix of 9 “cells”, each cell containing a different $3 \times 3$ matrix of 9 ordinary numbers.

\[\begin{bmatrix}1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{bmatrix} \, \begin{bmatrix}0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{bmatrix} \, \begin{bmatrix}0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{bmatrix}\] \[\begin{bmatrix}0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0\end{bmatrix} \, \begin{bmatrix}0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0\end{bmatrix} \, \begin{bmatrix}0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0\end{bmatrix}\] \[\begin{bmatrix}0 & 0 & 0 \\ 0 & 0 & 0 \\ 1 & 0 & 0\end{bmatrix} \, \begin{bmatrix}0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 1 & 0\end{bmatrix} \, \begin{bmatrix}0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1\end{bmatrix}\]

These are all the one-hot matrices, just as our basis vectors are the one hot vectors. Each little matrix provides us with an independent “ingredient”, and by mixing these ingredients (that is, by scaling and adding them), we can make any possible matrix. The top-left matrix supplies the ingredient of the top-left matrix element, and so on. So this is really just another vector space, in which the vectors are matrices (or if you prefer, just 9-dimensional vectors), and the above 9 matrices are an orthogonal basis for that vector space. But just like any other vector space, we’re not limited to one possible basis. Any set of basis tensors will do, as long as they are all mutually linearly independent.

The reason for labouring this point is that the notation for a tensor space formed from a vector space joined to itself, $V \otimes V$, could be mistaken for meaning the space of tensors that can be formed by choosing two vectors from the vector space $V$ and making their tensor product. But that space is only a rather limited subspace of $V \otimes V$. Again, take the 3-vectors in $\mathbb{R}^3$, and choose any two vectors $\vec{v}$ and $\vec{w}$. The matrix obtained from $\vec{v} \otimes \vec{w}$ conforms to a simple rule: every row is the coordinates of one of the vectors scaled by one coordinate of the other vector. The same is true for every column. This means the rows are the coordinates of colinear vectors, as are the columns. If you changed one of the matrix elements a tiny amount, you’d find a matrix that cannot be produced by $\vec{v} \otimes \vec{w}$, whatever pair of vectors you choose. Put simply, there isn’t enough information in 6 coordinates to fill out 9 elements completely independently.




Not yet regretting the time you've spent here?

Keep reading:

  • Vectors - Intuitions
  • Poorly Structured Notes on AI Part 4
  • Poorly Structured Notes on AI Part 3
  • Poorly Structured Notes on AI Part 2
  • Poorly Structured Notes on AI Part 1