. Twitter thread has a rich discussion of the AI vs Cloud Providers https://x.com/rakyll/status/1771641289840242754?s=20. The emerging AI Cloud is simpler to use.
This repo is an exploration & building modals Programming Artificial intelligence. As we evolve Compound AI Systems I hope we preserve the simplicity.
-- Introduction to Transformers w/ Andrej Karpathy
-- A Comprehensive Overview of Large Language Models
Open - w/ Weights, Training & Inference Code, Data & Evaluation
Andrej Karpathy Talking about the importance of building a more open and vibrant AI ecosystem
Tour of Modern LLMs (and surrounding topics)
--
- Llama
- Gemma
- Grok open release
- Robust recipes to align language models with human and AI preferences
- A natural language interface for computers
- OpenDevin
The successful art of model merging is often based purely on experience and intuition of a passionate model hacker, .... In fact, the current Open LLM Leaderboard is dominated by merged models. Surprisingly, merged models work without any additional training, making it very cost-effective (no GPUs required at all!), and so many people, researchers, hackers, and hobbyists alike, are trying this to create the best models for their purposes.
What I learned from looking at 900 most popular open source AI tools
state-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models.
The Shift from Models to Compound AI Systems
Generative language models (LMs) are often chained together and combined with other components into compound AI systems. Compound AI system applications include retrieval-augmented generation (RAG). structured prompting, chatbot verification, multi-hop question answering, agents, and SQL query generation. ALTO: An Efficient Network Orchestrator for Compound AI Systems
-
Chat + Assistants:
-
Tools Usage:
-
Evaluation: https://openfeature.dev/
-
Apple M1/M2/M3: Get a MacBook Pro M3 Max (CPU, GPU cores + up to 128GB Unified Memory) https://www.apple.com/shop/buy-mac/macbook-pro/16-inch
- pip install -U mlx
- https://github.com/ml-explore
-
NVIDIA: Hopper -> Blackwell
-
Trying to Build commodity ~petaflop compute node / https://x.com/karpathy/status/1770164518758633590?s=20
Intelligence is the computational part of the ability to achieve goals. A goal achieving system is one that is more usefully understood in terms of outcomes than in terms of mechanisms.
The Definition of Intelligence
We don't know how to measure LLM abilities well. Most tests are groups of multiple choice questions, tasks, or trivia - they don't represent real world uses well, they are subject to gaming & results are impacted by prompt design in unknown ways. Or they use human preference. Non-trivial Taxonomy in real-world, starting with clear domains / Common LLM workloads:
- Languages - Rankings by domain -> https://huggingface.co/models -> Tasks & Languages
- Model Card - claude 3 model card Coding, Creative Writing, Instruction-following, Long Document Q&A
- Chat, RAG, few-shot benchmark, etc.
- Coding - "Code a login component in React"
- Freshness - "What was the Warriors game score last night?"
- Agent - Web Agents -> https://turkingbench.github.io/
- Multimodal (images and video)
- Reasoning?
-
LMSYS Chatbot Arena Leaderboard
- LLM Judge
- Predictive Human Preference(PHP) / Conclusion: "LMSYS folks told me that due to the noisiness of crowd-sourced annotations and the costs of expert annotations, they’ve found that using GPT-4 to compare two responses works better. Depending on the complexity of the queries, generating 10,000 comparisons using GPT-4 would cost only $200 - 500, making this very affordable for companies that want to test it out."
-
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
-
We’re doing cutting edge research for transparent, auditable AI alignment "Current methods of “alignment” are insufficient;evaluations are even worse. Human intent reflects a rich tapestry of preferences, collapsed by uniform models. AI`s potential hinges on trust, from interpretable data to every layer built upon it. Informed decisions around risk are not binary. Training on raw human data doesn’t scale. Your models should adapt and scale, automatically." -> Suppressing Pink Elephants with Direct Principle Feedback
--
- OpenAI Evals
- Martian Model Router / OpenAI Evals
- Intelligent Language Model Router
- Evaluate-Iterate-Improve
- Open-Source Evaluation for GenAI Application Pipelines
--
- A large-scale, fine-grained, diverse preference dataset
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI
Application Programming Interfaces (APIs) are at the heart of all internet software, compounding with Foundational Model API-First.
Gateway API Concepts: https://gateway-api.sigs.k8s.io/ Use Cases: https://gateway-api.sigs.k8s.io/#use-cases
Cloud Providers: In the ever-evolving cloud computing landscape, understanding the Gateway API is crucial for those using Kubernetes. This API enhances application traffic management, offering better routing and security. For seamless integration of AI into cloud-native applications, a robust framework ? streamlining the deployment and management of AI-driven solutions. Dive into the Gateway API for insights and explore Use Cases for cutting-edge application management.
- Text Generation Inference
- vLLM
- LLM inference in C/C++
- Baseten / https://github.com/basetenlabs/truss
- TensorRT-LLM / (Use Case: https://www.perplexity.ai/hub/blog/introducing-pplx-api)
abstract away cloud infra burdens, Launch jobs & clusters on any cloud, Maximize GPU usage
-
Stanford DSPy: The framework for programming—not prompting—foundation models
-
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
- LangChain OpenGPTs / Elixir implementation of a LangChain style framework / LangChain for Go
- GPTScript
- AI RSC Demo
- Prompt Playground Like Chatbot Arena https://chat.lmsys.org/
- AI SDK
- v0
- Establishing industry wide AI best practices and standards for AI Engineers
- The Data Provenance Initiative
- Universal and Transferable Adversarial Attacks on Aligned Language Models
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
- Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible.
As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two.
Differentiable programming is not merely the differentiation of programs, but also the thoughtful design of programs intended for differentiation. By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs.
The Elements of Differentiable Programming (Draft, ~380 Pages!)
A best place to learn all in one place https://huggingface.co/docs & Open-Source AI Cookbook
-
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization
-
Why not implement this in PyTorch?
-
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
- LLM App Stack aka Emerging Architectures for LLM Applications
- A Guide to Large Language Model Abstractions
- AI Fundamentals: Benchmarks 101
- The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures
- AI Copilot Interfaces
- Evaluating LLMs is a minefield
- Large Language Models and Theories of Meaning
- Objective-Driven AI
- Large Language Models: A Survey
- On the Planning Abilities of Large Language Models : A Critical Investigation
- Demystifying Embedding Spaces using Large Language Models
- WizardLM: Empowering Large Language Models to Follow Complex Instructions
- RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
- Multi-line AI-assisted Code Authoring
- Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
--