## Implementing Pytorch-like arbitrary dimensional Tensor in Go

In one of the Karpathy’s videos, he recommends the blogpost by Edward Z. Yang called PyTorch internals to learn about the implementation of Tensor class in Pytorch. The blogpost is fantastic and goes into detailed representation of some of the basic functionality of Tensors. After reading the post, I tried to implement similar functionality in Go using slices. In this post, I will walk through following methods of basic Tensor stuct:...

## How to make Obsidian and Jekyll equations compatible

TL;DR Set processEscapes to False or remove it from tax dictionary <script> MathJax = { tex: { inlineMath: [['$', '$'], ['\\(', '\\)']], }, svg: { fontCache: 'global' } }; </script> <script type="text/javascript" id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js"> </script> What was not working I’ve tried to write blog posts using Obsidian for my Jekyll website. The issue was the Mathjax version I was using didn’t recognize single dollar sign for inline equations. I was using this Mathjax setup taken from the documentation:...

## ActMAD : Activation Matching to Align Distributions for Test-Time-Training paper review

Test-time adaptation Test-time adaptation is one of the emerging topics in tackling distribution shift in model deployment. Typically, the lifecycle of the model deployment includes followings: (Pre)-Training the model on the training dataset offline. Deploying the model in real world After getting some more data, further retrain the model Repeat 2,3 steps Some of the issues with the above steps are: The environment might change during the deployment period, so the model might lose its performance over time....

## LoRA : Low-Rank Adaptation of Large Language Models

Paper link: https://arxiv.org/abs/2106.09685 Instead of updating the pre-trained model weights $W_0$ directly, low-rank decomposition matrices added $W_0 + BA$ where $B,A \in R^{d \times r}$ and only $BA$ is finetuned keeping $W_0$ is frozen. After the training, they can be added to get final model. Advantages Large pre-trained model weights are not changed during fine-tuning. Inference is the same, $Wx = W_0x + BAx = (W_0+BA)x$ It is efficient and have small training footprint (Only $BA$ matrices are trained) Swapping the models in deployment is fast....

## Few-shot image classification using Prototypical Networks

In few-shot image classification problem, $K$ examples of $N$ types of images are given as an “support” set and the task is to classify the new images by comparing them to those images. CS330 course reader One of the simplest methods is Prototypical Networks. During training we feed both “support” and “query” images to the CNN encoder model. Then we take the center of each image class in “support set” and compare the distance of “query” (test) images....

## Hungarian matching algorithm in DETR

Introduction In the End-to-End Object Detection with Transformers paper, they directly predict $N$ number of prediction boxes and treat them as set. To find the matching predicted boxes with the target boxes they use Hungarian matching algorithm. There is a great blogpost by Lei Mao explaining the basic concepts of Hungarian matching algorithm. Short summary of the problem In the case of DETR, we predict 100 boxes which is more than maximum number of boxes almost any image....