Seb Duerr
Portfolio Projects Blog About
  • Enter property: Transfer learning: take a great model & retrain end layers to fit it to own needs
  • Torbogen: Overfitting: if model only recognizes only one set of data, force-fitting
  • Stairs up: Learning rate: how quickly params are updated
  • Door: Tensor: matrix n * m, use torch.set_default_dtype(torch.flat64)
  • 2nd Door: Backprop: calcs grad of a loss func wrt to all weights in NN (backward propagation of errors)
  • Main arch: Gradient descent: iterative optimization algo for finding minima of loss func
  • Toilette: Minibatch: random bunch of input points to update weights
  • Bathroom: Epoch: one complete run through all data points
  • Basement door: SGD: grad descent using mini batches
  • Hallway: Architecture: math function to fit paarams to (y = mx + t)
  • Stove: Loss Functions: how far/cloase predictions are to labels (mse, mae, cel)
  • Oven Adjusters: Regularization: techniques to reduce overfitting (l1 lasso, l2 ridge)
  • Under Oven Adjusters: Underfitting: too simple to explain variance, validation set is really bad
  • Nicer Dicer: Tokenization: represent each word / lingu concept with one token after a text was converted
  • Scale: Numericalization: take complete list of words, uniquify them, and order due to their occurence
  • Cook book text: WikiText-103: 1B tokens with knowledge about the world
  • Cook book ingredients: Tabular data use cases: sales forecasting, fraud detection, failure prediction, pricing
  • Kitchenaid: Super Convergence: alternating cycle use low LR & high momentum & reverse
  • Filter: Softmax: function where all activations add to 1 & >0
  • Mash: Weight decay: subtract constant wd * weights every time a batch is processed
  • Walking to living Room: Momentum: update of step is based on opposite of previous direction
  • Mom: Bias: values representing features (like personal preference)
  • Couch: Matrix Factorization: factors from latent variables
  • TV: PCA: dimensionality reduction technique
  • Photo Books: Dropout: tanking out a number of activations
  • Light Switch: Activation function: ReLU or sigmoid
  • Dad Computer: Code structure: data, experiments, model
  • Window: Core training step:
    • Balcony: output_batch = model(train_batch)
    • Stefan Silo: loss = loss_fn(output_batch, labels_batch)
    • Street: optimizer.zero_grad() # clear prev grads
    • Silo: loss.backward() # calc grads wrt to inputs + params
    • Halle: optimizer.step() # perform updates using calc’d grads
    • Bed: Model.train(): before training, impacts: dropout and batch_norm
    • TV: model.eval(): after training
with by Seb Duerr
Disclaimer
NLP