pytorch 22 gpu 21 distributed-training 20 cuda 20 compiler 13 platform-engineering 11 computer-architecture 10 linux 10 mlops 10 performance 8 kernel 7 memory 6 autograd 6 triton 5 python 5 programming 5 architecture 4 optimization 4 llm 4 matmul 4 shared-memory 4 reduction 4 developer-experience 4 lecture 4 operator 4 parser 3 cpu 3 nccl 3 fsdp 3 kernel-optimization 3 backward 3 automata 2 regex 2 formal-language 2 lexical-analysis 2 ir 2 virtual-memory 2 mmu 2 io 2 dma 2 systems 2 roadmap 2 all-reduce 2 communication 2 ddp 2 activation 2 tensor-parallel 2 pipeline-parallel 2 deepspeed 2 sharding 2 bandwidth 2 occupancy 2 profiling 2 nsight 2 warp-shuffle 2 numerical-stability 2 container 2 devops 2 feature-engineering 2 mlflow 2 monitoring 2 observability 2 cicd 2 testing 2 github-actions 2 idp 2 metrics 2 adoption 2 internals 2 runtime 2 custom-op 2 dfa 1 nfa 1 finite-automaton 1 regular-language 1 cfg 1 grammar 1 parse-tree 1 pda 1 pushdown-automaton 1 stack 1 context-free 1 phases 1 frontend 1 backend 1 lexer 1 tokenizer 1 top-down 1 ll-parser 1 recursive-descent 1 bottom-up 1 lr-parser 1 shift-reduce 1 ast 1 syntax-tree 1 semantic-analysis 1 type-checking 1 symbol-table 1 scope 1 ssa 1 llvm 1 code-generation 1 register-allocation 1 instruction-selection 1 target 1 von-neumann 1 alu 1 control-unit 1 registers 1 isa 1 cisc 1 risc 1 x86 1 arm 1 pipeline 1 branch-prediction 1 superscalar 1 ilp 1 cpu-privilege 1 protection-ring 1 kernel-mode 1 protected-mode 1 interrupt 1 exception 1 idt 1 isr 1 cache 1 memory-hierarchy 1 dram 1 sram 1 page-table 1 tlb 1 bus 1 pcie 1 multicore 1 numa 1 smp 1 cache-coherence 1 data-parallel 1 sgd 1 ring 1 batch-size 1 optimizer 1 gradient-accumulation 1 optimizer-state 1 topology 1 nvlink 1 infiniband 1 model-parallel 1 transformer 1 attention 1 sequence-parallel 1 context-length 1 stages 1 micro-batch 1 schedule 1 1f1b 1 gpipe 1 activation-checkpointing 1 recomputation 1 zero 1 overlap 1 checkpoint 1 fault-tolerance 1 resume 1 debugging 1 timeout 1 oom 1 megatron 1 frameworks 1 systems-design 1 training-stack 1 prerequisites 1 learning-roadmap 1 warp 1 block 1 launch-configuration 1 indexing 1 coalescing 1 flash-attention 1 latency-hiding 1 roofline 1 matrix-multiply 1 tiling 1 bank-conflict 1 warp-level 1 softmax 1 layernorm 1 rmsnorm 1 vectorized-load 1 alignment 1 register-pressure 1 spill 1 tensor-core 1 mixed-precision 1 async-copy 1 pipelining 1 double-buffering 1 flashattention 1 os 1 system-programming 1 process 1 thread 1 fork 1 exec 1 scheduler 1 cfs 1 scheduling 1 paging 1 filesystem 1 vfs 1 ext4 1 inode 1 syscall 1 trap 1 kernel-module 1 device-driver 1 block-device 1 synchronization 1 mutex 1 semaphore 1 concurrency 1 networking 1 tcp-ip 1 socket 1 netfilter 1 cgroups 1 namespaces 1 docker 1 machine-learning 1 ml-lifecycle 1 data-pipeline 1 etl 1 experiment-tracking 1 wandb 1 hyperparameter 1 model-registry 1 versioning 1 dvc 1 model-serving 1 deployment 1 inference 1 data-drift 1 model-drift 1 automation 1 feature-store 1 feast 1 infrastructure 1 kubernetes 1 scaling 1 platform 1 kubeflow 1 ml-platform 1 devex 1 golden-path 1 templates 1 iac 1 terraform 1 crossplane 1 backstage 1 developer-portal 1 service-catalog 1 product-management 1 pipelines 1 opentelemetry 1 security 1 governance 1 policy-as-code 1 team-building 1 organization 1 lecture-introduction 1 python-philosophy 1 list 1 generator 1 iterator 1 oop 1 class 1 descriptor 1 bound-method 1 tensor 1 storage 1 stride 1 contiguous 1 memory-format 1 dispatcher 1 registry 1 graph 1 custom-function 1 allocator 1 stream 1 async 1 cpp-extension 1 cuda-extension 1 dispatch 1 meta 1 fused-operator 1 amp 1 autocast 1 torch-compile 1 fx 1 inductor 1 distributed 1 extension 1 packaging 1