Deep learning

An introduction for the layperson

Giorgio Sironi - SETI@eLife

(if you're reading this on your laptop, press S for notes)

Giorgio Sironi (@giorgiosironi)

I work for eLife Sciences
Italy European Union
  • Software Engineer in Tools and Infrastructure
  • What do I do
    • Distributed systems
    • Automated complex tests, integrating many different projects
    • Continuous Delivery
    • Pasta and risotto
We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

John McCarthy (LISP), Claude Shannon (information theory), Marvin Minsky et al, 1956

The perceptron

Training a perceptron

x1 x2 ŷ y e
0 0 0 -0.1 0.1
0 1 0 0.2 -0.2
1 0 0 1.1 -1.1
1 1 1 0.9 0.1

Derivatives

Perceptron learning rule

                    for each example x:
                        y = f(w * x) // * is a vector product
                        e = ŷ - y // + error, - derivative
                        for each weight i:
                            wi = η * e * xi // ∝ derivative
                    

Linear boundaries

On to networks

Networks and backpropagation

In from three to eight years we will have a machine with the general intelligence of an average human being. -- Marvin Minsky, 1970

Deep neural networks

A lot to learn

  • possibly millions of weights
  • feature extraction: which and how many layers work best for this problem?
  • does the brain really work like that?

AlphaGo: supervised

On training set: This data set contains 29.4 million positions from 160,000 games played by KGS 6 to 9 dan human players
On training time and resources: The value network was trained for 50 million mini-batches of 32 positions, using 50 GPUs, for one week.

Alpha(Go) Zero: reinforcement

My conclusions

  • There is both hype and beauty
  • Big hardware and, unless it's a game, big data
  • We weren't able to do OCR or speech recognition, now they're normal

References

Thanks!