. Twitter thread has a rich discussion of the AI vs Cloud Providers https://x.com/rakyll/status/1771641289840242754?s=20. The emerging AI Cloud is simpler to use.

This repo is an exploration & building modals Programming Artificial intelligence. As we evolve Compound AI Systems I hope we preserve the simplicity.

-- Introduction to Transformers w/ Andrej Karpathy
-- A Comprehensive Overview of Large Language Models

"Open" & Closed

Open - w/ Weights, Training & Inference Code, Data & Evaluation

Andrej Karpathy Talking about the importance of building a more open and vibrant AI ecosystem

Ecosystem Graphs

Tour of Modern LLMs (and surrounding topics)

OLMo / OLMo: Accelerating the Science of Language Models
NOUS RESEARCH
An awesome repository of local AI tools
- Jan - Rethink the Computer

--

Llama
- Discover the possibilities of building on Llama
Gemma
Grok open release
Robust recipes to align language models with human and AI preferences
A natural language interface for computers
OpenDevin

The successful art of model merging is often based purely on experience and intuition of a passionate model hacker, .... In fact, the current Open LLM Leaderboard is dominated by merged models. Surprisingly, merged models work without any additional training, making it very cost-effective (no GPUs required at all!), and so many people, researchers, hackers, and hobbyists alike, are trying this to create the best models for their purposes.

Evolving New Foundation Models: Unleashing the Power of Automating Model Development

What I learned from looking at 900 most popular open source AI tools

Little guide to building Large Language Models in 2024
PeoplePlusAI - Where AI meets people and purpose

Compound AI Systems

state-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models.
The Shift from Models to Compound AI Systems

Generative language models (LMs) are often chained together and combined with other components into compound AI systems. Compound AI system applications include retrieval-augmented generation (RAG). structured prompting, chatbot verification, multi-hop question answering, agents, and SQL query generation. ALTO: An Efficient Network Orchestrator for Compound AI Systems

Chat + Assistants:
- OpenAI GPT Actions (ChatPlus subscription is needed to access)
- HuggingChat
Tools Usage:
- Function Calling
Evaluation: https://openfeature.dev/

Hardware & Accelerators

Apple M1/M2/M3: Get a MacBook Pro M3 Max (CPU, GPU cores + up to 128GB Unified Memory) https://www.apple.com/shop/buy-mac/macbook-pro/16-inch
- pip install -U mlx
- https://github.com/ml-explore
NVIDIA: Hopper -> Blackwell
Trying to Build commodity ~petaflop compute node / https://x.com/karpathy/status/1770164518758633590?s=20

Leaderboards, Benchmarks & Evaluations

Intelligence is the computational part of the ability to achieve goals. A goal achieving system is one that is more usefully understood in terms of outcomes than in terms of mechanisms.
The Definition of Intelligence

We don't know how to measure LLM abilities well. Most tests are groups of multiple choice questions, tasks, or trivia - they don't represent real world uses well, they are subject to gaming & results are impacted by prompt design in unknown ways. Or they use human preference. Non-trivial Taxonomy in real-world, starting with clear domains / Common LLM workloads:

Languages - Rankings by domain -> https://huggingface.co/models -> Tasks & Languages
Model Card - claude 3 model card Coding, Creative Writing, Instruction-following, Long Document Q&A
Chat, RAG, few-shot benchmark, etc.
Coding - "Code a login component in React"
Freshness - "What was the Warriors game score last night?"
Agent - Web Agents -> https://turkingbench.github.io/
Multimodal (images and video)
Reasoning?

LMSYS Chatbot Arena Leaderboard
- LLM Judge
- Predictive Human Preference(PHP) / Conclusion: "LMSYS folks told me that due to the noisiness of crowd-sourced annotations and the costs of expert annotations, they’ve found that using GPT-4 to compare two responses works better. Depending on the complexity of the queries, generating 10,000 comparisons using GPT-4 would cost only $200 - 500, making this very affordable for companies that want to test it out."
AI2 WildBench Leaderboard
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
SWE-bench coding benchmark
- SWE-bench Lite
Yale Semantic Parsing and Text-to-SQL Challenge
Massive Text Embedding Benchmark (MTEB) Leaderboard
A holistic framework for evaluating foundation models
Open LLM Leaderboard
EvalPlus Leaderboard
Collections
Functional Benchmarks and the Reasoning Gap
CodeMind is a generic framework for evaluating inductive code reasoning of LLMs. It is equipped with a static analysis component that enables in-depth analysis of the results.
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Yet Another Applied LLM Benchmark
We’re doing cutting edge research for transparent, auditable AI alignment "Current methods of “alignment” are insufficient;evaluations are even worse. Human intent reflects a rich tapestry of preferences, collapsed by uniform models. AI`s potential hinges on trust, from interpretable data to every layer built upon it. Informed decisions around risk are not binary. Training on raw human data doesn’t scale. Your models should adapt and scale, automatically." -> Suppressing Pink Elephants with Direct Principle Feedback

--

OpenAI Evals
Martian Model Router / OpenAI Evals
Intelligent Language Model Router
Evaluate-Iterate-Improve
Open-Source Evaluation for GenAI Application Pipelines

--

A large-scale, fine-grained, diverse preference dataset
Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI

Inferencing

Application Programming Interfaces (APIs) are at the heart of all internet software, compounding with Foundational Model API-First.

Gateway API Concepts: https://gateway-api.sigs.k8s.io/ Use Cases: https://gateway-api.sigs.k8s.io/#use-cases

Cloud Providers: In the ever-evolving cloud computing landscape, understanding the Gateway API is crucial for those using Kubernetes. This API enhances application traffic management, offering better routing and security. For seamless integration of AI into cloud-native applications, a robust framework ? streamlining the deployment and management of AI-driven solutions. Dive into the Gateway API for insights and explore Use Cases for cutting-edge application management.

Text Generation Inference
vLLM
LLM inference in C/C++
Baseten / https://github.com/basetenlabs/truss
TensorRT-LLM / (Use Case: https://www.perplexity.ai/hub/blog/introducing-pplx-api)

Training

abstract away cloud infra burdens, Launch jobs & clusters on any cloud, Maximize GPU usage

SkyPilot: A framework for running ML and batch jobs on any cloud cost-effectively

Prompt

Anthropic Prompt library / Prompt engineering
Prompt Engineering with Llama 2

Programming—not prompting—Language Models

Stanford DSPy: The framework for programming—not prompting—foundation models
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions

Frameworks & Scripting

LangChain OpenGPTs / Elixir implementation of a LangChain style framework / LangChain for Go
GPTScript

UI/UX

Hugging Face

Demo your machine learning model
Chat UI (an open ChatGPT)
- Hugging Face's Chat Assistants
- Open source implementation of the OpenAI 'data analysis mode' (aka ChatGPT + Python execution) based on Mistral-7B / Hugging Face for Chat UI, ProjectJupyter for code executor
Tokenizer Playground - How different models tokenize text

Vercel

AI RSC Demo
- Prompt Playground Like Chatbot Arena https://chat.lmsys.org/
- AI SDK
- v0

Standards

Establishing industry wide AI best practices and standards for AI Engineers
The Data Provenance Initiative

Security

Universal and Transferable Adversarial Attacks on Aligned Language Models
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations

Learning

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible.
As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two.
Differentiable programming is not merely the differentiation of programs, but also the thoughtful design of programs intended for differentiation. By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs. The Elements of Differentiable Programming (Draft, ~380 Pages!)

A best place to learn all in one place https://huggingface.co/docs & Open-Source AI Cookbook

Machine Learning cheatsheets for Stanford's CS 229
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization
- Byte Pair Encoding: building the GPT tokenizer with Karpathy
Why not implement this in PyTorch?
- ml-explore/mlx#12
AI GUIDE
A little guide to building Large Language Models in 2024
LLMs-from-scratch
Large Language Model Course
RAGStack
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
Easily use and train state of the art retrieval methods in any RAG pipeline. Designed for modularity and ease-of-use, backed by research
Building the open-source feedback layer for LLMs
Data Provenance Explorer
The Engineer's Guide To Deep Learning(coming soon)

Articles & Talks

LLM App Stack aka Emerging Architectures for LLM Applications
A Guide to Large Language Model Abstractions
AI Fundamentals: Benchmarks 101
The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures
AI Copilot Interfaces
Evaluating LLMs is a minefield
Large Language Models and Theories of Meaning
Objective-Driven AI

AI Twitter & Discord

@karpathy
@simonw
@antirez
@ivanfioravanti

Papers

Large Language Models: A Survey
On the Planning Abilities of Large Language Models : A Critical Investigation
Demystifying Embedding Spaces using Large Language Models
WizardLM: Empowering Large Language Models to Follow Complex Instructions
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Multi-line AI-assisted Code Authoring
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

--

Transformers Slides

Foundation models
"Natural Language APIs"
System Design

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

search-problem-space.md

search-problem-space.md

"Open" & Closed

Compound AI Systems

Hardware & Accelerators

Leaderboards, Benchmarks & Evaluations

Inferencing

Training

Prompt

Programming—not prompting—Language Models

Frameworks & Scripting

UI/UX

Hugging Face

Vercel

Standards

Security

Learning

Articles & Talks

AI Twitter & Discord

Papers

Files

search-problem-space.md

Latest commit

History

search-problem-space.md

File metadata and controls

"Open" & Closed

Compound AI Systems

Hardware & Accelerators

Leaderboards, Benchmarks & Evaluations

Inferencing

Training

Prompt

Programming—not prompting—Language Models

Frameworks & Scripting

UI/UX

Hugging Face

Vercel

Standards

Security

Learning

Articles & Talks

AI Twitter & Discord

Papers