mercredi 28 octobre 2020

A Recipe for Overtaking the Number Two

So here's an idea I don't have good words for, but it keeps cropping up. Imagine we're in Congress and a recently drafted bill is in revision. Let's say it's about transportation. That affects everyone almost equally, in the sense that we all absolutely need it for food and other supplies.

We can probably safely predict that Republicans and Democrats agree that there are transportation problems to solve, and that they're important. There's a call for bipartisan support for a bipartisan bill. That seems rational to most.

Also, we can probably also predict that Democrats are proposing more spending, and Republicans are proposing less spending, or even cutting existing expenditures. There's a stereotype of fiscally responsible conservatives and fiscally wasteful liberals. If history supplies evidence of that, it's certainly very mixed at best. For example, in recent times, Democratic control seems correlated with more robust fiscal decision-making, prosperity, and even balancing the budget. No doubt Republicans can point to evidence that says the opposite. I don't claim to know for sure.

These ins and outs are not my wheelhouse. But I think there's enough evidence for someone out there to have extracted more or less the right answer, whatever it is.

My point is not about which way that goes. But for the sake of illustration, let's say Democrats want to tax gasoline more and build high-speed rail lines and add bus routes, and Republicans want to allow more tollbooths and expand existing highways and set aside HOV lanes for the environmentalists. Both sides say they are trying to use what's there, expand throughput, and reduce emissions, while keeping a close eye on the budget.

Because Democrats expect Republicans to shoot down new spending, they go big on the proposals. Compromise comes later. They'll argue for exactly what they want up front, and make it sound more self-evident than the sun in the sky, except that it isn't going far enough. Republicans will be waiting for this by assembling a number of arguments for shooting it all down.

Maybe we could call this an arms race. That seems too vague. I don't have a good, more specific name for it. But here's a great cartoon of Microsoft when it had entrenchment problems. (Entrenchment, hedging, arms races, negotiation bids, and polarization all describe what I'm talking about, but no phrase with the precision I would like as to what, why, and how: the idea of taking specific opposition as a hidden assumption and the distortion that creates.)

You've probably seen it before. Critical detail: the imagined guns inside the bubbles, to which the actual guns are a response.

My point is that Democrats end up defining themselves as not-Republicans, and Republicans as not-Democrats. Any time any of them says anything, it has to be taken in the context of what they are not. Democrats often do not speak from the origin, but in response to what was just said. Not "gun laws need improvement" but "guns are slaughtering our children by the thousand." The same with Republicans. Because they can rely on their opposition to be there, and rely on encountering resistance, they get used to pushing harder. They go X amount one way, and the other side pulls them Y amount back. So a back-of-the-napkin calculation shows that any time they want to go X far, they have to push for X + Y. If they believe anything a bit, they suddenly have to believe in it absolutely, or it counts for nothing - or, worse, less than nothing, because people are weird about transparent uncertainty.

This leads to extreme or at least entrenched attitudes. Everything they say is within this tug-o-war context. It can't be taken at face value.

Yet it often is, even by them.

And we watch it and get involved and start to mirror this.

This business in Congress is not the only reason for polarization; we don't need a legislative body for that. There is something called Heider balance in social psychology that shows, in math, the basic reason for and mechanism of polarization. It isn't complicated. But the Congress image is one clear example of it: the common enemy. The enemy of my enemy is my friend. The friend of my enemy is my enemy. The idea is in the Bible, but it lives deeper in the mind than recorded history, in instinct. People fall in with the party line so they aren't taken for the enemy. That's why polarization happens. It's a mathematical consequence when you apply those rules to a network. You get two camps. If things are tense enough, a war breaks out between the two camps. It's happened a million times - maybe a trillion throughout human and primate evolution. Maybe more.

But if we know that, we can alter it, maybe even adjust for it completely.

The point is we understand how it happens.

The better you understand a problem and its context, the closer you are to fixing it.

Here is my ultra-simple prescription:

1) There are MORE THAN two sides to every story.

2) Curiosity

samedi 24 octobre 2020

Is Your Brain a Matrix?

"Artificial networks are the most promising current models for understanding the brain."

Skepticism on that point has been popular for a long time. Skepticism is good. But your brain has a lot in common with a giant matrix. Your senses are like a giant vector (the data in your favorite song), and your thoughts and actions another giant vector (a synth recording of you covering that song, capturing your input to the instrument), with your brain a matrix that converts one to the other. How far does the similarity go? Unknown... a giant matrix can approximate any process as closely as you want, if it's big enough and has enough time/examples to learn.

If all that - a giant matrix as a brain - seems too simple to be possible, keep in mind that this kind of matrix represents an interacting network. There's a math proof that a matrix can approximate any process, meaning any natural or computational process as far as we understand the word "process," and it's very closely related to the way you can break any sound down into a spectrum of frequencies. The proof actually depends on the same idea.

The "deep" in "deep learning" just means using a bigger matrix. Often that means using fancier hardware to run the learning faster, but not necessarily. This is very similar to cameras and screens with higher and higher resolutions. A newer phone should have a faster chip to keep up with a higher pixel count in camera and screen. But it doesn't technically need a faster chip. It would just slow down otherwise. Images didn't get more complicated, only bigger.

But for that ability to sculpt a matrix into any process to really work, the matrix needs to be broken up into individual vectors, and those are run against the input - the vector representing senses - one at a time, with each result - a work-in-progress vector - put on a curve a bit like a grading curve. This curved result is then sent to interact with the next vector that was broken off the matrix. Rinse and repeat!

Eventually that work-in-progress vector is done, at which point it represents the thoughts/actions that are the output. Think of each number in the vector as the strength of each dimension of possible response, the probability of hitting each note on a piano, or how much to move each muscle, etc. So to put the last paragraph in different words, a "deep learning" matrix, aka neural network, is no more than a bunch of multiplications in the form of dot products between pairs of vectors, with a little filter/curve after each one.

Incidentally, each one of those vectors broken off the matrix can be visualized as a line or edge. You can imagine that you could draw any picture, even a 3D one, even a 5005D one, with enough lines or edges. You can make it as clean and accurate as you want by adding more lines. We know that intuitively, because that's how sketching works. Deep learning is not unlike sketching very fast. Similarly, you can draw a very smooth circle, as smooth as you want, with enough little square pixels. See it? Now we can do that with concepts.

But those are details. Students who think matrix math is boring will typically hear about AI from me, haha. And they do tend to find it interesting.

The curve, or conditioning, after each step is what makes this different from just multiplying a giant vector by a giant matrix to get another giant vector. That would be too simple, and it's kind of the lie I told at the start. Instead, information flows step by step through the layers of the matrix much like energy filtering up through the layers of an ecosystem, towards apex predators and decomposers. And there's that curve/filter between each level. I suppose it's a bit like a goat eating grass which is converted into goat; something changes in the middle. It isn't grass to grass, it's grass to goat, so there's a left turn in there somewhere. That bend is critical but not complicated at all, though why it's critical is more difficult and I don't fully understand why. That filter doesn't even have to be a curve, it can just mean putting a kink in each line - just a bend in each vector, like a knee or elbow. It almost doesn't matter what the bend is, just that it's there. That's surprisingly essential to the universality of neural networks, so apparently it adds a lot for very little. I don't have a good analogy for why that's true, except that the world isn't actually made up of a bunch of straight lines. It's more like a bunch of curves and surfaces and volumes and energy and particles and static and other noise and signals between interconnected systems, and this step, putting kinks in the lines, allows the processing to break out into a much larger possibility space.

Theoretically, the old possibility space (without bends) was the stuff that you could accomplish with the "transformations" you learned in geometry - stretches, rotations, reflections, glides. The new space is all possibility space - or any "before/after" that can be measured and processed as a measurement. Artificially aging your neighbor's cat, painting today's sunset from weather data... If there's any logical connection between input and output, between before and after if there's time involved - even if that connection is just the laws of physics - or even if it's just a random association to memorize, like didn't you know volcanoes and lemons are connected because I said so - that connection can be represented by a big enough matrix.

So instead of pixels, it's lines, and instead of lines, it's bends. Think of bends as moments of change. Maybe this is a little like adding 3D glasses and color to a greyscale picture without altering the resolution. But... the effect of the curving/filtering/bending I've been talking about would be far more shocking than the image upgrade if you could directly experience the difference, given that we get the potential of learning and mimicking every known process. Maybe we do directly experience that difference as a key component of being alive. It's more like adding motion to that image, and an understanding of where the motion comes from and where it's going. Or to rephrase, the greyscale picture with our "kinks" update is now more like a mind than a photo - which, after all, is a simpler kind of matrix, one that is not a network.

The other simplification I made is that the big matrix is actually broken down into multiple matrices first, before those are broken down into individual vectors, each of which is roughly equivalent to a single neuron. What I described was a single-file chain of neurons, but there can be many neurons next to each other. Each layer of neurons in a neural network is its own matrix. Each neuron is its own vector. But I'd say that aspect of the layers is the least important detail here, other than realizing you can see each row of a matrix as a brain cell, which is neat. And you can very roughly imagine each brain cell as knowing how to draw one line-with-bend through concept space and give its vote on that basis.

We have 6 layers of neurons in the cerebral cortex, for reference, so at a gross simplification that would be 6 big matrices in a chain, with the rows of each matrix representing individual neurons.