Tensors

Last time I got into labelling the coordinates of vectors and covectors with indices in a particular pattern:

  • for a vector coordinate, the index is superscript, such as $a^n$, whereas
  • for a covector coordinate, the index is subscript, such as $a_n$.

These of course describe ordinary numbers (scalars). By saying $a^n$, we mean there is some fixed number of dimensions to our vector space (let’s say $3$) and therefore $n$ can take on the values $1$, $2$ or $3$. So $a^2$ is the ordinary number that is the second coordinate of the vector $\vec{a}$, while $b_3$ is the third coordinate of the covector $\vec{b}$.

But this presupposes that we’ve chosen a set of basis vectors to be scaled by these coordinates. Coordinates are not fundamental. That is, we don’t necessarily want to say that a vector is its coordinates. So we also need a corresponding notation for basis (co)vectors, and we swap the placement of the indices like this:

  • for a basis vector, the index is subscript, such as $\vec{e}_n$, whereas
  • for a basis covector, the index is superscript, such as $\vec{e}^n$.

Why swap them? Because we’re going to follow a simple rule of multiplying things with indices in opposite positions, and this configuration allows us to follow that rule even when scaling basis vectors by coordinates:

\[\vec{a} = \sum_n a^n \vec{e}_n\]

I also talked about a bit of machinery for linearly mapping a vector to a corresponding covector (or the reverse), and how this something else we have to configure for our vector space. The basis vectors and covectors are already paired up by how they are defined, but for anything else there is no natural pairing. We use a matrix to provide the missing information, and there is a significance to how we position the indices used to label its numeric elements:

  • for a matrix that maps from vector to covector coordinates, the indices are subscript: $g_{ij}$, whereas
  • for a matrix that maps from covector to vector coordinates, the indices are superscript: $g^{ij}$.

The arithmetic of doing this kind of mapping, e.g. to get coordinates of the covector corresponding to some vector coordinates, is matrix multiplication:

\[a_m = \sum_n g_{mn} a^n\]

The input vector has its index “up” and the result has its index “down”, from which $g_{mn}$ gets its nickname, the lowering operator, though more officially it’s called a metric. The inverse metric $g^{mn}$ is likewise called the raising operator.

We can also invent linear mappings from vectors to vectors: things like rotation, reflection, skewing, but within the same vector space. They will also be described by a matrix with two indices, but we need to be careful how we place those indices, to be clear about the types of the input and output:

\[a^m = \sum_n M^m_{\ \ \ \ n} a^n\]

See how the two $n$ indices always seem to “cancel out”, being in opposite positions. Using the same summation index variable for two objects whose coordinates are being multiplied is called contraction. It doesn’t matter how many other factors go into the same term. It’s like we’re doing a scaled dot product between one vector and one covector. The matrix is being used like a collection of covectors, each of which produces (via the dot product) one of the coordinates of the resultant vector. The output only has whatever indices remain after the contraction has done its elimination, and those remaining indices will not change position, so we can easily check that our equation makes sense. In the above example, $m$ is the only remaining index, and it stays “up”.

This whole pattern is extremely general and powerful. We can define calculating machines that work on a particular number of vector (or covector) inputs. The matrix $M^m_{\ \ \ \ n}$ is an example that takes a vector input and a produces a vector output. The metric $g_{mn}$ turns a vector into a covector. But also, a covector acting on a vector to produce a scalar is an example of the same pattern: contracting by tying two indices to the same summation variable.

The general name for these calculation machines is tensors. They include vectors and covectors, appropriately configured matrices representing operators, but also in some contexts, much larger collections of numbers that have to be labelled with three or four indices. For example, the Riemann curvature tensor $R^a_{\ \ bcd}$, where all indices have 4 possible values in the case of classical spacetime, consists of $4^4 = 256$ numbers. At the other end of the scale of complexity, it’s perfectly possible for a tensor to have no indices at all; that means it must be simply a scalar.

The ultimate constraint on tensors is that they can be contracted with other tensors (be they vectors, covectors or something more complex) until all we have left is a scalar. The key point is if we change the orientation of our basis vectors, the coordinates in all our tensors will change accordingly, but they will still represent the same thing, and we can be sure this is the case if we contract down to a scalar, because we’ll always get the same scalar from the same system of tensors regardless of the basis we choose. Indeed, the traditional way tensors are defined is as systems of coordinates that transform in a particular way under a change of basis (one often used but unhelpfully glib definition is “a tensor is something that transforms like a tensor.”)

But, just as we saw with vectors, it pays to think about these concepts in more ways than one. A tensor can be thought of as a geometrical object. Okay, it’s not as easy to visualise as an arrow. But just as a vector can be thought of as an arrow, which is a distinct kind of geometric object, not just a collection of numbers (even though it can be described with numbers if you choose a basis for doing so), any tensor is also a geometrical object of a distinct kind, not just a collection of numbers.

Given this, the index notation, which up to now we’ve been interpreting as a way to label collections of numbers, can instead be interpreted as a way to describe abstract machines that operator on geometrical objects - ultimately vectors and covectors - to produce scalars whose values are completely independent of any choice of basis. Some authors call it abstract index notation, others call it slot-naming index notation. (One convention I’ve seen in textbooks uses the Greek alphabet for indices when the notation is to be interpreted as numerical realisations, and the Latin alphabet for abstract notation.)

Another notational point: Einstein noticed that his hand got quite tired from endlessly writing $\sum$ symbols for all the summations. He realised that they were completely unnecessary in this area, because a summation symbol only ever introduced a summation variable that was used in exactly two indices, one up and one down. Therefore the mere appearance of such a pair of linked indices tells you that there’s a summation. For example:

\[\sum_{\mu} \sum_{\nu} \sum_{\beta} \sum_{\lambda} g_{\mu\nu} Z^\mu_{\ \ \beta} a^{\beta} Z^\nu_{\ \ \lambda} b^{\lambda}\]

Four different tensors: the metric $g$, and an operator $Z$ being applied to two vectors ($a$ and $b$). But there are four contractions happening: every index is a summation variable, such that up-down pairs share the same variable. As a result, it’s entirely unnecessary to write the summation symbols. This says it all:

\[g_{\mu\nu} Z^\mu_{\ \ \beta} a^{\beta} Z^\nu_{\ \ \lambda} b^{\lambda}\]

In examples where we want the output to be something more than a scalar, i.e. a tensor with at least one index, the equation will still be perfectly unambiguous without any explicit summation symbols, because there will be some left-over indices that have no oppositely-positioned partner and therefore are not being summed over. They survive in the output. With this abbreviation, a square matrix multiplying with a column matrix (or, interpreted abstractly, an operator acting on a vector) is simply written:

\[M^m_{\ \ \ \ n} a^n\]

Abstract index notation takes care of many notational duties that are sometimes performed in other ways. We say of $g_{mn}$ that it’s a type $(0, 2)$-tensor, the two numbers counting the up and down indices, but the abstract index notation already conveys this fact. Likewise the tensor product symbol $\otimes$ can be used to convey the notion of joining the covector space $V^*$ to itself:

\[V^* \otimes V^*\]

the result of which is a space of $(0, 2)$-tensors, but again, this conveys nothing that hasn’t already been made clear by saying $g_{mn}$.




Not yet regretting the time you've spent here?

Keep reading:

  • Vectors - Intuitions
  • Poorly Structured Notes on AI Part 4
  • Poorly Structured Notes on AI Part 3
  • Poorly Structured Notes on AI Part 2
  • Poorly Structured Notes on AI Part 1