Introduction

Concepts in spaces

 

What does it mean to "learn vowels"?

 

Peterson and Barney (1952, p. 182)

petersonbarney52-vowelspace

 

Spaces

What are vectors?

3Blue1Brown on vectors:

In truth, it doesn’t matter whether you think of vectors as fundamentally being arrows in space that happen to have a nice numerical representation, or fundamentally as lists of numbers that happen to have a nice geometric interpretation. The usefulness of linear algebra has less to do with either one of these views than it does with the ability to translate back and forth between them. It gives the data-analyst a nice way to conceptualize many lists of numbers in a visual way, which can seriously clarify patterns in the data and give a global view of what certain operations do. On the flip side, it gives people like physicists and computer graphics programmers a language to describe space, and the manipulation of space, using numbers that can be crunched and run through a computer.

Let's take this still of geometric and numerical perspectives for vectors from 3Blue1Brown's lesson on vector spaces. What does this remind you of, from thinking about phonetic spaces?

3bl1br_vectors_visualize_patterns

 

What's that "graph paper" representing? It's R2! Slightly more formally, from Axler (2015, Example 1.7):

The set R2, which you can think of as a plane, is the set of all ordered pairs of real numbers:

(1)R2={(x,y):x,yR}

(What's the definition of R? R3?)

How do we generalize to higher dimensions? From Axler (2015, Definition 1.8):

Suppose n is a nonnegative integer. A list of length n is an ordered collection of n elements (which might be numbers, other lists, or more abstract entities) separated by commas and surrounded by parentheses. A list of length n looks like this:

(2)(x1,,xn)

Two lists are equal if and only if they have the same length and the same elements in the same order.

So to generalize to Rn, see Axler (2015, Definition 1.10)

Rn is the set of all lists of length n of elements of R:

(3)Rn={(x1,,xn):xjR for j=1,,n}

Addition and multiplication on vectors

Relate these to phonetic spaces!

3Blue1Brown vector addition animation:

 

3Blue1Brown vector scaling animation

 

That's all you need to define a vector space!

More formally, from Axler (2015) p. 12: (note: you can substitute out F (field) for R; Axler generalizes to both real and complex spaces. Complex spaces are important for Fourier bases, but not important for you to understand the definition below.)

axler2015_vector_space_1

axler2015_vector_space_2

 

And once you have a vector space, you can introduce the idea of basis vectors.

A still of basis vectors from 3Blue1Brown's lesson on linear combinations, span, and basis vectors.

 

3blue_1brown_basis_vectors

 

Vector-ish spaces: basis functions

The same simple, geometric intuitions you have about vectors apply to "vector-ish" things like functions. These intuitions form the essence of approaches to scientific understanding of spaces that phonological concepts live in!

Here's a still from 3Blue1Brown's lesson on abstract vector spaces

3blue1brown_abstract_vector_spaces

 

Moving from a waveform to a spectrum (animation from Lucas Vieira) is the decomposition of the waveform into sines and cosines (Fourier basis)!

Fourier_transform_time_and_frequency_domains

Interactive Fourier series demo

 

Animations from Gavin Simpson's GAMs webinar repo.

You give me a squiggly curve over [0,1]: some function f(x):

spline-anim

I can build it for you from a basis set of splines, i.e., a linear combination of spline functions!

basis-fun-anim

 

Learning concepts

From Osherson et al. (1985) p. 8 Systems that Learn:

Learning typically involves

  1. a learner
  2. a thing to be learned
  3. an environment in which the thing to be learned is exhibited to the learner
  4. the hypotheses that occur to the learner about the thing to be learned on the basis of the environment

 

See also Anthony & Biggs (1992, p.1) on learning from examples, especially the distinction between the axtual example and the coded example.

anthonybiggs92_learning_diagram

Niyogi (1998, p. 3-7): informational complexity of learning from examples

"...if information is provided to the learner about the target function in some fashion, how much information is needed for the learner to learn the target well? In the task of learning from examples, (examples, as we shall see later are really often nothing more than ((x,y)=f(x)) pairs where (x,y)X×Y and f:XY ) how many examples does the learner need to see?

From Niyogi (1998, p. 10) on factors affecting the informational complexity of learning from examples:

niyogi98_fig11_factors_complexity_learning

 

but see alternative perspectives (!!) ... Mumford & Desolneux (2010, p. 4)

"To apply pattern theory properly, it is essential to identify correctly the patterns present in the signal. We often have an intuitive idea of the important patterns, but the human brain does many things unconsciously and also takes many shortcuts to get things done quickly. Thus, a careful analysis of the actual data to see what they are telling us is preferable to slapping together an off-the-shelf Gaussian or log-linear model based on our guesses. Here is a very stringent test of whether a stochastic model is a good description of the world: sample from it. This is so obvious that one would assume everyone does this, but in actuality, this is not so. The samples from many models that are used in practice are absurd oversimplifications of real signals, and, even worse, some theories do not include the signal itself as one of its random variables (using only some derived variables), so it is not even possible to sample signals from them."

Fn 1 immediately following: This was, for instance, the way most traditional speech recognition systems worked: their approach was to throw away the raw speech signal in the preprocessing stage and replace it with codes designed to ignore speaker variation. In contrast, when all humans listen to speech, they are clearly aware of the idiosyncrasies of the individual speaker’s voice and of any departures from normal. The idea of starting by extracting some hopefully informative features from a signal and only then classifying it is via some statistical algorithm is enshrined in classic texts such as [64].

 

Distances and similarity

image from Grootendorst post on distance metrics

grootendorst_nine_distance_metrics

 

Time-courses and trajectories

 

Exploratory data analysis, visualization and geometric perspectives

Tukey (1977, p. vi):

exploratory data analysis...looking at data to see what it seems to say

"The greatest value of a picture is when it forces us to notice what we never expected to see"

The "curse of dimensionality" and high-dimensional spaces

 

Further reading