Study log 0 - Sub Specie

This series will be my diary of what I did each day to become a researcher.

A bit on my background

I did a B.Sc. and an M.Sc. in computer science at a German university, and have worked full-time as a software engineer for about 7 years now. As a result, I have very good software engineering experience, I’m strong at Leetcode-like problems, DS&A as well as many concerns of actually running software in production, e.g. deployment, testing, observability, A/B testing, data pipelines, scalable architecture, different components of real products (essentially system design - load balancer vs reverse proxy vs API gateway, different DBs or messaging queues and their tradeoffs, horizontal and vertical scaling, identifying single points of failure, …). At the same time, I think the moat of almost all of this is rapidly diminishing as models get better at coding.

On the ML side, I did a good amount of math in uni, but I started uni almost 14 years ago now, so it’s been a while. I did several courses on ML or related to ML, though they usually focused on backpropagation and a wider overview of the field (eg NNs, RNNs, LSTMs, SVMs, kNN, etc), whereas it seems nowadays things have converged much more on various transformer architectures. As a result, my personal study curriculum focuses strongly on the ML side of things. I still remember enough math (eg multivariate calculus) to mostly follow papers or get in the weeds of hand computing backpropagation in smaller networks, but I assume I will need to brush up or learn some new stuff for example on RL.

Some resources I can recommend

From the ARENA prerequisites: I found this post great to build better intuitions for KL divergence. You don’t need to understand them all (they cover a wide range of fields) and I found I got most of the benefit simply from reading the summary.
This video by Karpathy on neural networks, backpropagation, etc. I’ve had this topic several times in uni, read a few blogposts on it, but just spending a few hours coding along with this (I initially tried to code it myself in Rust with 0 coding tool assistance, but don’t know enough Rust yet to do this properly. I then switched to Python and finished it in a few hours, though I deliberately changed the code slightly to ensure I wasn’t just blindly copying what he showed). I achieved a MUCH better understanding of even some core concepts by doing this exercise and learned about many nuances I was not aware of before. Karpathy is an amazing teacher that shows you great intuitions, I’ll definitely use some of his other videos as part of my curriculum.
I mention this in other spots, but LLMs are really excellent 1:1 tutors now. Ask them what certain lines of code do, ask them to explain a concept from a paper to you, explain your understanding of something and ask it to poke holes in it or mention what you don’t understand yet. It’s finally realising the promise of education for all that some people made with the advent of mobile internet and Wikipedia or free online university courses. I cannot stress enough how useful this is.

What I’d recommend against

I am currently subscribed to I think 7 AI newsletters, none of which I read as of now. The field is truly huge now, there’s probably 1 million videos on Youtube on the ideal claude code setup or whatever or the perfect system or OpenClaw to run your life etc, all preying on your FOMO. My advice would be:
- Be very selective in what media you consume in the first place. E.g. there are several podcast series I like, but they each produce hours of content each week. Unless you think an episode is super interesting or relevant to what you want to learn about right now, just don’t listen to it. Saying no and filtering is an insanely valuable skill in today’s world where there is so much content.
- It makes sense to try set aside time and try to optimize your work routines every now and then (in RL terms, exploration vs exploitation). However, this should be timeboxed and you should have a concrete problem you want to solve/process you want to speed up. I’ve never really gotten much value from generic frameworks (maybe Getting Things Done is an exception).
I also (maybe I’m a nerd) by now get papers suggested to me on my Facebook feed. Again, this is not a great way to find interesting papers - it’s in the account’s interest to paint each paper as the greatest thing since sliced bread to drive engagement. Instead, do proper literature searches, see which papers are recommended by experts, which are popular in your niche, etc.

Study log 0 contents

I was overall quite happy with the progress I made, despite it being mostly an admin day. It was a bank holiday, so normally I would not work at all. I achieved the following:

Made progress on finding out how my grant is treated by the UK tax authorities: Learned about HMRC’s non-statutory clearance service, built my case for how I think the grant should be treated in terms of taxes, and sent a request to them. They said they will get back within 28 days, I hope this topic will be closed then. Good news: I think my grant is tax free in the UK!
Made progress on choosing health insurance: Learned that I can possibly continue the policy my previous employer had for me. Could not finish this as it was a bank holiday, but scheduled a call for the next day. This has the big benefit that all pre-existing conditions will be covered. I also looked at a few alternative providers but did not request a quote yet.
Choosing which machine to use: I am considering buying a new laptop, mine is probably 8+ years old and I often struggle with disk space and some other stuff (as well as not being able to run basically any interesting models locally at all).
- I looked at macbooks which are probably the most common developer machine
- I also looked at the new framework 13 pro, a “Linux macbook”
- I also looked at refurbished offers on backmarket
- I looked at some benchmarks and experiments people ran to see how fast you can inference models locally with these machines. Learned a lot about what part of inference is bound by what part of your hardware (answer generation = memory bandwidth bound, time to first token = compute bound, which models you can run at all = memory size bound). There are huge differences in this, e.g. the framework 13 pro had ~90 GB/s memory bandwidth, while a fully specced out Macbook with an M5 max has 600 GB/s.
- Overall I decided that a refurbished macbook is probably the way to go for me, but I want to try to squeeze as much out of my current laptop as possible (eg remove dualboot and just go arch-only) before buying a new one. Essentially if I run into a wall in any of my work or research, I’ll buy the new one.
- I also realized that even with very expensive machines (3000 GBP+) you can really only run maybe up to 70B models, and the frontier is always advancing. While even 30B models have good e.g. coding performance now, the inference speed is at most going to be e.g. 30 tok/s. If you imagine you want the model to generate potentially thousands of lines of code for you, that is not going to work well. In addition, the power draw will be significant, so you can only do this for longer times if you have power. In essence, using dedicated cloud instances/renting GPUs is likely still the play, and privacy w.r.t model inputs is still a bit away.
Choosing my program: I started going through the ARENA prerequisites. I know a lot of them already (eg. Git, python programming etc) from my former job, and ARENA seems significantly less focused on interpretability than I thought. Seemed like a good starting point. I’ll refine what exactly I want to study in the coming days.

A bit on my background

Some resources I can recommend

What I’d recommend against

Study log 0 contents

Comments