Introduction

Concepts in spaces

What does it mean to "learn vowels"?

Peterson and Barney (1952, p. 182)

petersonbarney52-vowelspace

Spaces

What are vectors?

In truth, it doesn’t matter whether you think of vectors as fundamentally being arrows in space that happen to have a nice numerical representation, or fundamentally as lists of numbers that happen to have a nice geometric interpretation. The usefulness of linear algebra has less to do with either one of these views than it does with the ability to translate back and forth between them. It gives the data-analyst a nice way to conceptualize many lists of numbers in a visual way, which can seriously clarify patterns in the data and give a global view of what certain operations do. On the flip side, it gives people like physicists and computer graphics programmers a language to describe space, and the manipulation of space, using numbers that can be crunched and run through a computer.

Let's take this still of geometric and numerical perspectives for vectors from 3Blue1Brown's lesson on vector spaces. What does this remind you of, from thinking about phonetic spaces?

3bl1br_vectors_visualize_patterns

$\mathbb{R}^2$ ! Slightly more formally, from Axler (2015, Example 1.7):

$\mathbb{R}^2$ , which you can think of as a plane, is the set of all ordered pairs of real numbers:

\begin{matrix} (1) & R^{2} = {(x, y) : x, y \in R} \end{matrix}

$\mathbb{R}$ $\mathbb{R}^3$ ?)

How do we generalize to higher dimensions? From Axler (2015, Definition 1.8):

$n$ is a nonnegative integer. A listlength $n$ $n$ looks like this:
$\begin{matrix} (2) & (x_{1}, \dots, x_{n}) \end{matrix}$
Two lists are equal if and only if they have the same length and the same elements in the same order.

$\mathbb{R}^n$ , see Axler (2015, Definition 1.10)

$\mathbb{R}^n$ $n$ $\mathbb{R}$ :

\begin{matrix} (3) & R^{n} = {(x_{1}, \dots, x_{n}) : x_{j} \in R for j = 1, \dots, n} \end{matrix}

Addition and multiplication on vectors

Relate these to phonetic spaces!

3Blue1Brown vector addition animation:

3Blue1Brown vector scaling animation

That's all you need to define a vector space!

$\mathbb{F}$ $\mathbb{R}$ ; Axler generalizes to both real and complex spaces. Complex spaces are important for Fourier bases, but not important for you to understand the definition below.)

axler2015_vector_space_1

axler2015_vector_space_2

And once you have a vector space, you can introduce the idea of basis vectors.

A still of basis vectors from 3Blue1Brown's lesson on linear combinations, span, and basis vectors.

3blue_1brown_basis_vectors

Vector-ish spaces: basis functions

The same simple, geometric intuitions you have about vectors apply to "vector-ish" things like functions. These intuitions form the essence of approaches to scientific understanding of spaces that phonological concepts live in!

Here's a still from 3Blue1Brown's lesson on abstract vector spaces

3blue1brown_abstract_vector_spaces

Moving from a waveform to a spectrum (animation from Lucas Vieira) is the decomposition of the waveform into sines and cosines (Fourier basis)!

Fourier_transform_time_and_frequency_domains

Interactive Fourier series demo

Animations from Gavin Simpson's GAMs webinar repo.

$f(x)$ :

spline-anim

I can build it for you from a basis set of splines, i.e., a linear combination of spline functions!

basis-fun-anim

Learning concepts

From Osherson et al. (1985) p. 8 Systems that Learn:

Learning typically involves

a learner
a thing to be learned
an environment in which the thing to be learned is exhibited to the learner
the hypotheses that occur to the learner about the thing to be learned on the basis of the environment

See also Anthony & Biggs (1992, p.1) on learning from examples, especially the distinction between the axtual example and the coded example.

anthonybiggs92_learning_diagram

Niyogi (1998, p. 3-7): informational complexity of learning from examples

$((x,y) = f(x))$ $(x,y) \in X \times Y$ $f: X \rightarrow Y$ ) how many examples does the learner need to see?

$\mathcal{C}$ $c \in \mathcal{C}$ $X$ $Y$ $X$ $Y$ are sets
"an environment in which the thing to be learned is exhibited to the learner": via the access of the learner to examples
"the hypotheses that occur to the learner about the thing to be learned on the basis of the environment"
- $\mathcal{H}$ "
- $c \in \mathcal{C}$ $d$ $\mathcal{H}$ given by
  $\begin{matrix} (4) & h_{\infty} = \arg min_{h \in H} d (c, h) \end{matrix}$

From Niyogi (1998, p. 10) on factors affecting the informational complexity of learning from examples:

niyogi98_fig11_factors_complexity_learning

but see alternative perspectives (!!) ... Mumford & Desolneux (2010, p. 4)

"To apply pattern theory properly, it is essential to identify correctly the patterns present in the signal. We often have an intuitive idea of the important patterns, but the human brain does many things unconsciously and also takes many shortcuts to get things done quickly. Thus, a careful analysis of the actual data to see what they are telling us is preferable to slapping together an off-the-shelf Gaussian or log-linear model based on our guesses. Here is a very stringent test of whether a stochastic model is a good description of the world: sample from it. This is so obvious that one would assume everyone does this, but in actuality, this is not so. The samples from many models that are used in practice are absurd oversimplifications of real signals, and, even worse, some theories do not include the signal itself as one of its random variables (using only some derived variables), so it is not even possible to sample signals from them."

Fn 1 immediately following: This was, for instance, the way most traditional speech recognition systems worked: their approach was to throw away the raw speech signal in the preprocessing stage and replace it with codes designed to ignore speaker variation. In contrast, when all humans listen to speech, they are clearly aware of the idiosyncrasies of the individual speaker’s voice and of any departures from normal. The idea of starting by extracting some hopefully informative features from a signal and only then classifying it is via some statistical algorithm is enshrined in classic texts such as [64].

Distances and similarity

image from Grootendorst post on distance metrics

grootendorst_nine_distance_metrics

Time-courses and trajectories
Pointwise formant spaces vs. formant trajectory spaces: Stanley blog post example
Neural representations: still a time course! https://huggingface.co/spaces/GroNLP/neural-acoustic-distance

Exploratory data analysis, visualization and geometric perspectives

Tukey (1977, p. vi):

exploratory data analysis...looking at data to see what it seems to say

"The greatest value of a picture is when it forces us to notice what we never expected to see"
https://scottroy.github.io/geometric-interpretations-of-linear-regression-and-ANOVA.html

The "curse of dimensionality" and high-dimensional spaces

Euclidean distance in high dimensional spaces
MacKay (2003, p. 363-364) on the perils of importance sampling in high-dimensional spaces (h/t: Vehtari blog post.
See also 3Blue1Brown lesson How to lie with visual proofs
Curse of dimensionality visualization: https://online.datasciencedojo.com/blogs/curse-of-dimensionality-python
tSNE visualizations: https://ai.googleblog.com/2018/06/realtime-tsne-visualizations-with.html