The absolute trainer to light up AI agents.
The absolute trainer to light up AI agents.
Join our Discord community to connect with other users and contributors.
Read more on our documentation website.
pip install agentlightning
For the latest nightly build (cutting-edge features), you can install from Test PyPI:
pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightning
Please refer to our installation guide for more details.
To start using Agent-lightning, check out our documentation and examples.
Agent Lightning keeps the moving parts to a minimum so you can focus on your idea, not the plumbing. Your agent continues to run as usual; you can still use any agent framework you like; you drop in the lightweight agl.emit_xxx() helper, or let the tracer collect every prompt, tool call, and reward. Those events become structured spans that flow into the LightningStore, a central hub that keeps tasks, resources, and traces in sync.
On the other side of the store sits the algorithm you choose, or write yourself. The algorithm reads spans, learns from them, and posts updated resources such as refined prompt templates or new policy weights. The Trainer ties it all together: it streams datasets to runners, ferries resources between the store and the algorithm, and updates the inference engine when improvements land. You can either stop there, or simply let the same loop keep turning.
No rewrites, no lock-in, just a clear path from first rollout to steady improvement.
| Workflow | Status |
|---|---|
| CPU Tests | |
| Full Tests | |
| UI Tests | |
| Examples Integration | |
| Latest Dependency Compatibility | |
| Legacy Examples Compatibility |
If you find Agent Lightning useful in your research or projects, please cite our paper:
@misc{luo2025agentlightningtrainai,
title={Agent Lightning: Train ANY AI Agents with Reinforcement Learning},
author={Xufang Luo and Yuge Zhang and Zhiyuan He and Zilong Wang and Siyun Zhao and Dongsheng Li and Luna K. Qiu and Yuqing Yang},
year={2025},
eprint={2508.03680},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.03680},
}
This project welcomes contributions and suggestions. Start by reading the Contributing Guide for recommended contribution points, environment setup, branching conventions, and pull request expectations. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
This project has been evaluated and certified to comply with the Microsoft Responsible AI Standard. The team will continue to monitor and maintain the repository, addressing any severe issues, including potential harms, if they arise.
This project is licensed under the MIT License. See the LICENSE file for details.
A complete computer science study plan to become a software engineer.
I originally created this as a short to-do list of study topics for becoming a software engineer,
but it grew to the large list you see today. After going through this study plan, I got hired
as a Software Development Engineer at Amazon!
You probably won't have to study as much as I did. Anyway, everything you need is here.I studied about 8-12 hours a day, for several months. This is my story: Why I studied full-time for 8 months for a Google interview
Please Note: You won't need to study as much as I did. I wasted a lot of time on things I didn't need to know. More info about that is below. I'll help you get there without wasting your precious time.
The items listed here will prepare you well for a technical interview at just about any software company,
including the giants: Amazon, Facebook, Google, and Microsoft.Best of luck to you!
![]()
This is my multi-month study plan for becoming a software engineer for a large company.
Required:
Note this is a study plan for software engineering, not frontend engineering or full-stack development. There are really
super roadmaps and coursework for those career paths elsewhere (see https://roadmap.sh/ for more info).
There is a lot to learn in a university Computer Science program, but only knowing about 75% is good enough for an interview, so that's what I cover here.
For a complete CS self-taught program, the resources for my study plan have been included in Kamran Ahmed's Computer Science Roadmap: https://roadmap.sh/computer-science
---------------- Everything below this point is optional ----------------
If you want to work as a software engineer for a large company, these are the things you have to know.
If you missed out on getting a degree in computer science, like I did, this will catch you up and save four years of your life.
When I started this project, I didn't know a stack from a heap, didn't know Big-O anything, or anything about trees, or how to
traverse a graph. If I had to code a sorting algorithm, I can tell ya it would have been terrible.
Every data structure I had ever used was built into the language, and I didn't know how they worked
under the hood at all. I never had to manage memory unless a process I was running would give an "out of
memory" error, and then I'd have to find a workaround. I used a few multidimensional arrays in my life and
thousands of associative arrays, but I never created data structures from scratch.
It's a long plan. It may take you months. If you are familiar with a lot of this already it will take you a lot less time.
Everything below is an outline, and you should tackle the items in order from top to bottom.
I'm using GitHub's special markdown flavor, including tasks lists to track progress.
On this page, click the Code button near the top, then click "Download ZIP". Unzip the file and you can work with the text files.
If you're open in a code editor that understands markdown, you'll see everything formatted nicely.

Create a new branch so you can check items like this, just put an x in the brackets: [x]
Fork the GitHub repo: https://github.com/jwasham/coding-interview-university by clicking on the Fork button.

Clone to your local repo:
git clone https://github.com/<YOUR_GITHUB_USERNAME>/coding-interview-university.git
cd coding-interview-university
git remote add upstream https://github.com/jwasham/coding-interview-university.git
git remote set-url --push upstream DISABLE # so that you don't push your personal progress back to the original repo
Mark all boxes with X after you completed your changes:
git commit -am "Marked personal progress"
git pull upstream main # keep your fork up-to-date with changes from the original repo
git push # just pushes to your fork
Some videos are available only by enrolling in a Coursera or EdX class. These are called MOOCs.
Sometimes the classes are not in session so you have to wait a couple of months, so you have no access.
It would be great to replace the online course resources with free and always-available public sources,
such as YouTube videos (preferably university lectures), so that you people can study these anytime,
not just when a specific online course is in session.
You'll need to choose a programming language for the coding interviews you do,
but you'll also need to find a language that you can use to study computer science concepts.
Preferably the language would be the same, so that you only need to be proficient in one.
When I did the study plan, I used 2 languages for most of it: C and Python
This is my preference. You do what you like, of course.
You may not need it, but here are some sites for learning a new language:
You can use a language you are comfortable in to do the coding part of the interview, but for large companies, these are solid choices:
You could also use these, but read around first. There may be caveats:
Here is an article I wrote about choosing a language for the interview:
Pick One Language for the Coding Interview.
This is the original article my post was based on: Choosing a Programming Language for Interviews
You need to be very comfortable in the language and be knowledgeable.
Read more about choices:
See language-specific resources here
This book will form your foundation for computer science.
Just choose one, in a language that you will be comfortable with. You'll be doing a lot of reading and coding.
Your choice:
Your choice:
Here are some recommended books to supplement your learning.
Programming Interviews Exposed: Coding Your Way Through the Interview, 4th Edition
Cracking the Coding Interview, 6th Edition
Choose one:
This list grew over many months, and yes, it got out of hand.
Here are some mistakes I made so you'll have a better experience. And you'll save months of time.
I watched hours of videos and took copious notes, and months later there was much I didn't remember. I spent 3 days going
through my notes and making flashcards, so I could review. I didn't need all of that knowledge.
Please, read so you won't make my mistakes:
Retaining Computer Science Knowledge.
To solve the problem, I made a little flashcard site where I could add flashcards of 2 types: general and code.
Each card has a different formatting. I made a mobile-first website, so I could review on my phone or tablet, wherever I am.
Make your own for free:
I DON'T RECOMMEND using my flashcards. There are too many and most of them are trivia that you don't need.
But if you don't want to listen to me, here you go:
Keep in mind I went overboard and have cards covering everything from assembly language and Python trivia to machine learning and statistics.
It's way too much for what's required.
Note on flashcards: The first time you recognize you know the answer, don't mark it as known. You have to see the
same card and answer it several times correctly before you really know it. Repetition will put that knowledge deeper in
your brain.
An alternative to using my flashcard site is Anki, which has been recommended to me numerous times.
It uses a repetition system to help you remember. It's user-friendly, available on all platforms, and has a cloud sync system.
It costs $25 on iOS but is free on other platforms.
My flashcard database in Anki format: https://ankiweb.net/shared/info/25173560 (thanks @xiewenya).
Some students have mentioned formatting issues with white space that can be fixed by doing the following: open the deck, edit the card, click cards, select the "styling" radio button, and add the member "white-space: pre;" to the card class.
THIS IS VERY IMPORTANT.
Start doing coding interview questions while you're learning data structures and algorithms.
You need to apply what you're learning to solve problems, or you'll forget. I made this mistake.
Once you've learned a topic, and feel somewhat comfortable with it, for example, linked lists:
Keep doing problems while you're learning all this stuff, not after.
You're not being hired for knowledge, but how you apply the knowledge.
There are many resources for this, listed below. Keep going.
There are a lot of distractions that can take up valuable time. Focus and concentration are hard. Turn on some music
without lyrics and you'll be able to focus pretty well.
These are prevalent technologies but not part of this study plan:
This course goes over a lot of subjects. Each will probably take you a few days, or maybe even a week or more. It depends on your schedule.
Each day, take the next subject in the list, watch some videos about that subject, and then write an implementation
of that data structure or algorithm in the language you chose for this course.
You can see my code here:
You don't need to memorize every algorithm. You just need to be able to understand it enough to be able to write your own implementation.
Why is this here? I'm not ready to interview.
Why you need to practice doing programming problems:
There is a great intro for methodical, communicative problem-solving in an interview. You'll get this from the programming
interview books, too, but I found this outstanding:
Algorithm design canvas
Write code on a whiteboard or paper, not a computer. Test with some sample inputs. Then type it and test it out on a computer.
If you don't have a whiteboard at home, pick up a large drawing pad from an art store. You can sit on the couch and practice.
This is my "sofa whiteboard". I added the pen in the photo just for scale. If you use a pen, you'll wish you could erase.
Gets messy quickly. I use a pencil and eraser.

Coding question practice is not about memorizing answers to programming problems.
Don't forget your key coding interview books here.
Solving Problems:
Coding Interview Question Videos:
Challenge/Practice sites:
Alright, enough talk, let's learn!
But don't forget to do coding problems from above while you learn!
Well, that's about enough of that.
When you go through "Cracking the Coding Interview", there is a chapter on this, and at the end there is a quiz to see
if you can identify the runtime complexity of different algorithms. It's a super review and test.
For heapsort, see the Heap data structure above. Heap sort is great, but not stable
As a summary, here is a visual representation of 15 sorting algorithms.
If you need more detail on this subject, see the "Sorting" section in Additional Detail on Some Subjects
Graphs can be used to represent many problems in computer science, so this section is long, like trees and sorting.
Notes:
Full Coursera Course:
I'll implement:
If you need more detail on this subject, see the "String Matching" section in Additional Detail on Some Subjects.
This section will have shorter videos that you can watch pretty quickly to review most of the important concepts.
It's nice if you want a refresher often.
Mock Interviews:
Think of about 20 interview questions you'll get, along with the lines of the items below. Have at least one answer for each.
Have a story, not just data, about something you accomplished.
Some of mine (I already may know the answers, but want their opinion or team perspective):
Congratulations!
Keep learning.
You're never really done.
*****************************************************************************************************
*****************************************************************************************************
Everything below this point is optional. It is NOT needed for an entry-level interview.
However, by studying these, you'll get greater exposure to more CS concepts and will be better prepared for
any software engineering job. You'll be a much more well-rounded software engineer.
*****************************************************************************************************
*****************************************************************************************************
These are here so you can dive into a topic you find interesting.
You can expect system design questions if you have 4+ years of experience.
I added them to help you become a well-rounded software engineer and to be aware of certain
technologies and algorithms, so you'll have a bigger toolbox.
Know at least one type of balanced binary tree (and know how it's implemented):
"Among balanced search trees, AVL and 2/3 trees are now passé and red-black trees seem to be more popular.
A particularly interesting self-organizing data structure is the splay tree, which uses rotations
to move any accessed key to the root." - Skiena
Of these, I chose to implement a splay tree. From what I've read, you won't implement a
balanced search tree in your interview. But I wanted exposure to coding one up
and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code
I want to learn more about B-Tree since it's used so widely with very large data sets
AVL trees
Splay trees
Red/black trees
2-3 search trees
2-3-4 Trees (aka 2-4 trees)
N-ary (K-ary, M-ary) trees
B-Trees
I added these to reinforce some ideas already presented above, but didn't want to include them
above because it's just too much. It's easy to overdo it on a subject.
You want to get hired in this century, right?
SOLID
Union-Find
More Dynamic Programming (videos)
Advanced Graph Processing (videos)
MIT Probability (mathy, and go slowly, which is good for mathy things) (videos):
String Matching
Sorting
NAND To Tetris: Build a Modern Computer from First Principles
Sit back and enjoy.
List of individual Dynamic Programming problems (each is short)
Excellent - MIT Calculus Revisited: Single Variable Calculus
Skiena lectures from Algorithm Design Manual - CSE373 2020 - Analysis of Algorithms (26 videos)
Carnegie Mellon - Computer Architecture Lectures (39 videos)
MIT 6.042J: Mathematics for Computer Science, Fall 2010 (25 videos)
My Codex Skills
A collection of reusable development skills for Apple platforms, GitHub workflows, refactoring, diff review swarms, bug investigation swarms, code review, React performance work, and skill curation.
This repository contains focused, self-contained skills that help with recurring engineering tasks such as generating App Store release notes, debugging iOS apps, improving SwiftUI and React code, packaging macOS apps, running multi-agent diff reviews and bug hunts, reviewing and simplifying code changes, orchestrating larger refactors, and auditing what new skills a project actually needs.
Install: place these skill folders under $CODEX_HOME/skills
This repo currently includes 16 skills:
| Skill | Folder | Description |
|---|---|---|
| App Store Changelog | app-store-changelog |
Creates user-facing App Store release notes from git history by collecting changes since the last tag, filtering for user-visible work, and rewriting it into concise "What's New" bullets. |
| GitHub | github |
Uses the gh CLI to inspect and operate on GitHub issues, pull requests, workflow runs, and API data, including CI checks, run logs, and advanced queries. |
| iOS Debugger Agent | ios-debugger-agent |
Uses XcodeBuildMCP to build, launch, and debug the current iOS app on a booted simulator, including UI inspection, interaction, screenshots, and log capture. |
| macOS Menubar Tuist App | macos-menubar-tuist-app |
Builds, refactors, or reviews macOS menubar apps that use Tuist and SwiftUI, with emphasis on manifest ownership, store-layer architecture, and reliable local launch scripts. |
| macOS SwiftPM App Packaging (No Xcode) | macos-spm-app-packaging |
Scaffolds, builds, packages, signs, and optionally notarizes SwiftPM-based macOS apps without requiring an Xcode project. |
| Orchestrate Batch Refactor | orchestrate-batch-refactor |
Plans and executes larger refactor or rewrite efforts with dependency-aware parallel analysis and implementation using clearly scoped work packets. |
| Project Skill Audit | project-skill-audit |
Analyzes a project's past Codex sessions, memory, existing local skills, and conventions to recommend the highest-value new skills or updates to existing ones. |
| React Component Performance | react-component-performance |
Diagnoses slow React components by finding re-render churn, expensive render work, unstable props, and list bottlenecks, then suggests targeted optimizations and validation steps. |
| Bug Hunt Swarm | bug-hunt-swarm |
Runs a read-only four-agent bug investigation focused on reproduction, code-path tracing, regressors, and the fastest proof step, then returns a ranked root-cause path. |
| Review and Simplify Changes | review-and-simplify-changes |
Reviews a git diff or explicit file scope for reuse, code quality, efficiency, clarity, and standards issues, then optionally applies safe, behavior-preserving fixes. |
| Review Swarm | review-swarm |
Runs a read-only four-agent diff review focused on behavioral regressions, security risks, performance or reliability issues, and contract or test coverage gaps, then returns a prioritized fix path. |
| Swift Concurrency Expert | swift-concurrency-expert |
Reviews and fixes Swift 6.2+ concurrency issues such as actor isolation problems, Sendable violations, main-actor annotations, and data-race diagnostics. |
| SwiftUI Liquid Glass | swiftui-liquid-glass |
Implements, reviews, or refactors SwiftUI features to use the iOS 26+ Liquid Glass APIs correctly, with proper modifier ordering, grouping, interactivity, and fallbacks. |
| SwiftUI Performance Audit | swiftui-performance-audit |
Audits SwiftUI runtime performance from code and architecture, focusing on invalidation storms, identity churn, layout thrash, heavy render work, and profiling guidance. |
| SwiftUI UI Patterns | swiftui-ui-patterns |
Provides best practices and example-driven guidance for building SwiftUI screens and components, including navigation, sheets, app wiring, async state, and reusable UI patterns. |
| SwiftUI View Refactor | swiftui-view-refactor |
Refactors SwiftUI view files toward smaller subviews, MV-style data flow, stable view trees, explicit dependency injection, and correct Observation usage. |
Each skill is self-contained. Refer to the SKILL.md file in each skill directory for triggers, workflow guidance, examples, and supporting references.
Skills are designed to be focused and reusable. When adding new skills, ensure they:
Vim-fork focused on extensibility and usability
Neovim is a project that seeks to aggressively refactor Vim in order to:
See the Introduction wiki page and Roadmap
for more information.
See :help nvim-features for the full list, and :help news for noteworthy changes in the latest version!
Pre-built packages for Windows, macOS, and Linux are found on the
Releases page.
Managed packages are in Homebrew, Debian, Ubuntu, Fedora, Arch Linux, Void Linux, Gentoo, and more!
See BUILD.md and supported platforms for details.
The build is CMake-based, but a Makefile is provided as a convenience.
After installing the dependencies, run the following command.
make CMAKE_BUILD_TYPE=RelWithDebInfo
sudo make install
To install to a non-default location:
make CMAKE_BUILD_TYPE=RelWithDebInfo CMAKE_INSTALL_PREFIX=/full/path/
make install
CMake hints for inspecting the build:
cmake --build build --target help lists all build targets.build/CMakeCache.txt (or cmake -LAH build/) contains the resolved values of all CMake variables.build/compile_commands.json shows the full compiler invocations for each translation unit.See :help nvim-from-vim for instructions.
├─ cmake/ CMake utils
├─ cmake.config/ CMake defines
├─ cmake.deps/ subproject to fetch and build dependencies (optional)
├─ runtime/ plugins and docs
├─ src/nvim/ application source code (see src/nvim/README.md)
│ ├─ api/ API subsystem
│ ├─ eval/ Vimscript subsystem
│ ├─ event/ event-loop subsystem
│ ├─ generators/ code generation (pre-compilation)
│ ├─ lib/ generic data structures
│ ├─ lua/ Lua subsystem
│ ├─ msgpack_rpc/ RPC subsystem
│ ├─ os/ low-level platform code
│ └─ tui/ built-in UI
└─ test/ tests (see test/README.md)
Neovim contributions since b17d96 are licensed under the
Apache 2.0 license, except for contributions copied from Vim (identified by the
vim-patch token). See LICENSE.txt for details.
Self-hosted AI accounting app. LLM analyzer for receipts, invoices, transactions with custom prompts and categories
🙏 I'm currently looking for a job! Here's my CV and my Github profile.
TaxHacker is a self-hosted accounting app designed for freelancers, indie hackers, and small businesses who want to save time and automate expense and income tracking using the power of modern AI.
Upload photos of receipts, invoices, or PDFs, and TaxHacker will automatically recognize and extract all the important data you need for accounting: product names, amounts, items, dates, merchants, taxes, and save it into a structured Excel-like database. You can even create custom fields with your own AI prompts to extract any specific information you need.
The app features automatic currency conversion (including crypto!) based on historical exchange rates from the transaction date. With built-in filtering, multi-project support, import/export capabilities, and custom categories, TaxHacker simplifies reporting and makes tax filing a bit easier.

Important
This project is still in early development. Use at your own risk! Star us to get notified about new features and bugfixes ⭐️
1 Analyze photos and invoices with AI
Snap a photo of any receipt or upload an invoice PDF, and TaxHacker will automatically recognize, extract, categorize, and store all the information in a structured database.
TaxHacker works with a wide variety of documents, including store receipts, restaurant bills, invoices, bank statements, letters, even handwritten receipts. It handles any language and any currency with ease.
2 Multi-currency support with automatic conversion (even crypto!)
TaxHacker automatically detects currencies in your documents and converts them to your base currency using historical exchange rates.
3 Organize your transactions using fully customizable categories, projects and fields
Adapt TaxHacker to your unique needs with unlimited customization options. Create custom fields, projects, and categories that better suit your specific needs, idustry standards or country.
4 Customize any LLM prompt. Even system ones
Take full control of how TaxHacker's AI processes your documents. Write custom AI prompts for fields, categories, and projects, or modify the built-in ones to match your specific needs.
TaxHacker is 100% adaptable and tunable to your unique requirements — whether you need to extract emails, addresses, project codes, or any other custom information from your documents.
5 Flexible data filtering and export
Once your documents are processed, easily view, filter, and export your complete transaction history exactly how you need it.
6 Self-hosted mode for data privacy
Keep complete control over your financial data with local storage and self-hosting options. TaxHacker respects your privacy and gives you full ownership of your information.
TaxHacker can be easily self-hosted on your own infrastructure for complete control over your data and application environment. We provide a Docker image and Docker Compose setup that makes deployment simple:
curl -O https://raw.githubusercontent.com/vas3k/TaxHacker/main/docker-compose.yml
docker compose up
The Docker Compose setup includes:
New Docker images are automatically built and published with every release. You can use specific version tags (e.g., v1.0.0) or latest for the most recent version.
For advanced setups, you can customize the Docker Compose configuration to fit your infrastructure. The default configuration uses the pre-built image from GitHub Container Registry, but you can also build locally using the provided Dockerfile.
Example custom configuration:
services:
app:
image: ghcr.io/vas3k/taxhacker:latest
ports:
- "7331:7331"
environment:
- SELF_HOSTED_MODE=true
- UPLOAD_PATH=/app/data/uploads
- DATABASE_URL=postgresql://postgres:postgres@localhost:5432/taxhacker
volumes:
- ./data:/app/data
restart: unless-stopped
Configure TaxHacker for your specific needs with these environment variables:
| Variable | Required | Description | Example |
|---|---|---|---|
UPLOAD_PATH |
Yes | Local directory for file uploads and storage | ./data/uploads |
DATABASE_URL |
Yes | PostgreSQL connection string | postgresql://user@localhost:5432/taxhacker |
PORT |
No | Port to run the application on | 7331 (default) |
BASE_URL |
No | Base URL for the application | http://localhost:7331 |
SELF_HOSTED_MODE |
No | Set to "true" for self-hosting: enables auto-login, custom API keys, and additional features | true |
DISABLE_SIGNUP |
No | Disable new user registration on your instance | false |
BETTER_AUTH_SECRET |
Yes | Secret key for authentication (minimum 16 characters) | your-secure-random-key |
You can also configure LLM provider settings in the application or via environment variables:
OPENAI_MODEL_NAME and OPENAI_API_KEYGOOGLE_MODEL_NAME and GOOGLE_API_KEYMISTRAL_MODEL_NAME and MISTRAL_API_KEYWe use:
brew install gs graphicsmagick)Set up your local development environment:
# Clone the repository
git clone https://github.com/vas3k/TaxHacker.git
cd TaxHacker
# Install dependencies
npm install
# Set up environment variables
cp .env.example .env
# Edit .env with your configuration
# Make sure to set DATABASE_URL to your PostgreSQL connection string
# Example: postgresql://user@localhost:5432/taxhacker
# Initialize the database
npx prisma generate && npx prisma migrate dev
# Start the development server
npm run dev
Visit http://localhost:7331 to see your local TaxHacker instance in action.
For a production build, instead of npm run dev use the following commands:
# Build the application
npm run build
# Start the production server
npm run start
We welcome contributions to TaxHacker! Here's how you can help make it even better:
All development happens on GitHub through issues and pull requests. We appreciate any help.
If TaxHacker has helped you save time or manage your finances better, consider supporting its continued development! Your donations help us maintain the project, add new features, and keep it free and open source. Every contribution helps ensure we can keep improving and maintaining this tool for the community.
TaxHacker is licensed under the MIT License.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
English | 简体中文 | 繁體中文 | 日本語 | 한국어 | Français | Русский | Español | العربية
PaddleOCR converts PDF documents and images into structured, LLM-ready data (JSON/Markdown) with industry-leading accuracy. With 70k+ Stars and trusted by top-tier projects like Dify, RAGFlow, and Cherry Studio, PaddleOCR is the bedrock for building intelligent RAG and Agentic applications.
Transforming messy visuals into structured data for the LLM era.
The global gold standard for high-speed, multilingual text spotting.
Released PaddleOCR-VL:
Model Introduction:
Core Features:
Released PP-OCRv5 Multilingual Recognition Model:
Significant Model Additions:
Deployment Capability Upgrades:
Benchmark Support:
Bug Fixes:
use_chart_parsing) in the PP-StructureV3 configuration files compared to other pipelines.Other Enhancements:
PaddleOCR official website provides interactive Experience Center and APIs—no setup required, just one click to experience.
For local usage, please refer to the following documentation based on your needs:
⭐ Star this repository to keep up with exciting updates and new releases, including powerful OCR and document parsing capabilities! ⭐
| PaddlePaddle WeChat official account | Join the tech discussion group |
|---|---|
![]() |
![]() |
PaddleOCR wouldn't be where it is today without its incredible community! 💗 A massive thank you to all our longtime partners, new collaborators, and everyone who's poured their passion into PaddleOCR — whether we've named you or not. Your support fuels our fire!
| Project Name | Description |
|---|---|
| Dify |
Production-ready platform for agentic workflow development. |
| RAGFlow |
RAG engine based on deep document understanding. |
| pathway |
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG. |
| MinerU |
Multi-type Document to Markdown Conversion Tool |
| Umi-OCR |
Free, Open-source, Batch Offline OCR Software. |
| cherry-studio |
A desktop client that supports for multiple LLM providers. |
| haystack |
AI orchestration framework to build customizable, production-ready LLM applications. |
| OmniParser |
OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent. |
| QAnything |
Question and Answer based on Anything. |
| Learn more projects | More projects based on PaddleOCR |
This project is released under the Apache 2.0 license.
@misc{cui2025paddleocr30technicalreport,
title={PaddleOCR 3.0 Technical Report},
author={Cheng Cui and Ting Sun and Manhui Lin and Tingquan Gao and Yubo Zhang and Jiaxuan Liu and Xueqing Wang and Zelun Zhang and Changda Zhou and Hongen Liu and Yue Zhang and Wenyu Lv and Kui Huang and Yichao Zhang and Jing Zhang and Jun Zhang and Yi Liu and Dianhai Yu and Yanjun Ma},
year={2025},
eprint={2507.05595},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.05595},
}
@misc{cui2025paddleocrvlboostingmultilingualdocument,
title={PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model},
author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Handong Zheng and Jing Zhang and Jun Zhang and Yi Liu and Dianhai Yu and Yanjun Ma},
year={2025},
eprint={2510.14528},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.14528},
}
@misc{cui2026paddleocrvl15multitask09bvlm,
title={PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing},
author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Yi Liu and Dianhai Yu and Yanjun Ma},
year={2026},
eprint={2601.21957},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.21957},
}
ChatDev 2.0: Dev All through LLM-powered Multi-Agent Collaboration
A Zero-Code Multi-Agent Platform for Developing Everything
【📚 Developers | 👥 Contributors|⭐️ ChatDev 1.0 (Legacy)】
ChatDev has evolved from a specialized software development multi-agent system into a comprehensive multi-agent orchestration platform.
• Jan 07, 2026: 🚀 We are excited to announce the official release of ChatDev 2.0 (DevAll)! This version introduces a zero-code multi-agent orchestration platform. The classic ChatDev (v1.x) has been moved to the chatdev1.0 branch for maintenance. More details about ChatDev 2.0 can be found on our official post.
•Sep 24, 2025: 🎉 Our paper Multi-Agent Collaboration via Evolving Orchestration has been accepted to NeurIPS 2025. The implementation is available in the puppeteer branch of this repository.
•May 26, 2025: 🎉 We propose a novel puppeteer-style paradigm for multi-agent collaboration among large language model based agents. By leveraging a learnable central orchestrator optimized with reinforcement learning, our method dynamically activates and sequences agents to construct efficient, context-aware reasoning paths. This approach not only improves reasoning quality but also reduces computational costs, enabling scalable and adaptable multi-agent cooperation in complex tasks.
See our paper in Multi-Agent Collaboration via Evolving Orchestration.
•June 25, 2024: 🎉To foster development in LLM-powered multi-agent collaboration🤖🤖 and related fields, the ChatDev team has curated a collection of seminal papers📄 presented in a open-source interactive e-book📚 format. Now you can explore the latest advancements on the Ebook Website and download the paper list.
•June 12, 2024: We introduced Multi-Agent Collaboration Networks (MacNet) 🎉, which utilize directed acyclic graphs to facilitate effective task-oriented collaboration among agents through linguistic interactions 🤖🤖. MacNet supports co-operation across various topologies and among more than a thousand agents without exceeding context limits. More versatile and scalable, MacNet can be considered as a more advanced version of ChatDev's chain-shaped topology. Our preprint paper is available at https://arxiv.org/abs/2406.07155. This technique has been incorporated into the macnet branch, enhancing support for diverse organizational structures and offering richer solutions beyond software development (e.g., logical reasoning, data analysis, story generation, and more).
• May 07, 2024, we introduced "Iterative Experience Refinement" (IER), a novel method where instructor and assistant agents enhance shortcut-oriented experiences to efficiently adapt to new tasks. This approach encompasses experience acquisition, utilization, propagation and elimination across a series of tasks and making the pricess shorter and efficient. Our preprint paper is available at https://arxiv.org/abs/2405.04219, and this technique will soon be incorporated into ChatDev.
• January 25, 2024: We have integrated Experiential Co-Learning Module into ChatDev. Please see the Experiential Co-Learning Guide.
• December 28, 2023: We present Experiential Co-Learning, an innovative approach where instructor and assistant agents accumulate shortcut-oriented experiences to effectively solve new tasks, reducing repetitive errors and enhancing efficiency. Check out our preprint paper at https://arxiv.org/abs/2312.17025 and this technique will soon be integrated into ChatDev.
• November 2, 2023: ChatDev is now supported with a new feature: incremental development, which allows agents to develop upon existing codes. Try --config "incremental" --path "[source_code_directory_path]" to start it.
• October 26, 2023: ChatDev is now supported with Docker for safe execution (thanks to contribution from ManindraDeMel). Please see Docker Start Guide.
• September 25, 2023: The Git mode is now available, enabling the programmer
to utilize Git for version control. To enable this feature, simply set "git_management" to "True" in ChatChainConfig.json. See guide.
• September 20, 2023: The Human-Agent-Interaction mode is now available! You can get involved with the ChatDev team by playing the role of reviewer
and making suggestions to the programmer
;
try python3 run.py --task [description_of_your_idea] --config "Human". See guide and example.
• September 1, 2023: The Art mode is available now! You can activate the designer agent
to generate images used in the software;
try python3 run.py --task [description_of_your_idea] --config "Art". See guide and example.
• August 28, 2023: The system is publicly available.
• August 17, 2023: The v1.0.0 version was ready for release.
• July 30, 2023: Users can customize ChatChain, Phasea and Role settings. Additionally, both online Log mode and replay
mode are now supported.
• July 16, 2023: The preprint paper associated with this project was published.
• June 30, 2023: The initial version of the ChatDev repository was released.
Backend Dependencies (Python managed by uv):
uv sync
Frontend Dependencies (Vite + Vue 3):
cd frontend && npm install
cp .env.example .env
API_KEY and BASE_URL in .env for your LLM provider.${VAR}(e.g., ${API_KEY})in configuration files to reference these variables.Start both Backend and Frontent:
make dev
Then access the Web Console at http://localhost:5173.
Start Backend:
# Run from the project root
uv run python server_main.py --port 6400 --reload
Remove
--reloadif output files (e.g., GameDev) trigger restarts, which interrupts tasks and loses progress.
Start Frontend:
cd frontend
VITE_API_BASE_URL=http://localhost:6400 npm run dev
Then access the Web Console at http://localhost:5173.
💡 Tip: If the frontend fails to connect to the backend, the default port
6400may already be occupied.
Please switch both services to an available port, for example:
- Backend: start with
--port 6401- Frontend: set
VITE_API_BASE_URL=http://localhost:6401
Help command:
make help
Sync YAML workflows to frontend:
make sync
Uploads all workflow files from yaml_instance/ to the database.
Validate all YAML workflows:
make validate-yamls
Checks all YAML files for syntax and schema errors.
OpenClaw can integrate with ChatDev by invoking existing agent teams or dynamically creating new agent teams within ChatDev.
To get started:
Start the ChatDev 2.0 backend.
Install the required skills for your OpenClaw instance:
clawdhub install chatdev
Ask your OpenClaw to create a ChatDev workflow. For example:
Automated information collection and content publishing
Create a ChatDev workflow to automatically collect trending information, generate a Xiaohongshu post, and publish it.
Multi-agent geopolitical simulation
Create a ChatDev workflow with multiple agents to simulate possible future developments of the Middle East situation.
Alternatively, you can run the entire application using Docker Compose. This method simplifies dependency management and provides a consistent environment.
Prerequisites:
.env file in the project root for your API keys.Build and Run:
# From the project root
docker compose up --build
Access:
http://localhost:6400http://localhost:5173The services will automatically restart if they crash, and local file changes will be reflected inside the containers for live development.
The DevAll interface provides a seamless experience for both construction and execution
Tutorial: Comprehensive step-by-step guides and documentation integrated directly into the platform to help you get started quickly.

Workflow: A visual canvas to design your multi-agent systems. Configure node parameters, define context flows, and orchestrate complex agent interactions with drag-and-drop ease.

Launch: Initiate workflows, monitor real-time logs, inspect intermediate artifacts, and provide human-in-the-loop feedback.

For automation and batch processing, use our lightweight Python SDK to execute workflows programmatically and retrieve results directly.
from runtime.sdk import run_workflow
# Execute a workflow and get the final node message
result = run_workflow(
yaml_file="yaml_instance/demo.yaml",
task_prompt="Summarize the attached document in one sentence.",
attachments=["/path/to/document.pdf"],
variables={"API_KEY": "sk-xxxx"} # Override .env variables if needed
)
if result.final_message:
print(f"Output: {result.final_message.text_content()}")
We have released the ChatDev Python SDK (PyPI package chatdev), so you can also run YAML workflow and multi-agent tasks directly in Python. For installation and version details, see PyPI: chatdev 0.1.0.
For secondary development and extensions, please proceed with this section.
Extend DevAll with new nodes, providers, and tools.
The project is organized into a modular structure:
server/ hosts the FastAPI backend, while runtime/ manages agent abstraction and tool execution.workflow/ handles the multi-agent logic, driven by configurations in entity/.frontend/ contains the Vue 3 Web Console.functions/ is the place for custom Python tools.Relevant reference documentation:
We provide robust, out-of-the-box templates for common scenarios. All runnable workflow configs are located in yaml_instance/.
demo_*.yaml showcase specific features or modules.ChatDev_v1.yaml) are full in-house or recreated workflows. As follows:| Category | Workflow | Case |
|---|---|---|
| 📈 Data Visualization | data_visualization_basic.yamldata_visualization_enhanced.yaml |
![]() Prompt: "Create 4–6 high-quality PNG charts for my large real-estate transactions dataset." |
| 🛠️ 3D Generation (Requires Blender & blender-mcp) |
blender_3d_builder_simple.yamlblender_3d_builder_hub.yamlblender_scientific_illustration.yaml |
![]() Prompt: "Please build a Christmas tree." |
| 🎮 Game Dev | GameDev_v1.yamlChatDev_v1.yaml |
![]() Prompt: "Please help me design and develop a Tank Battle game." |
| 📚 Deep Research | deep_research_v1.yaml |
![]() Prompt: "Research about recent advances in the field of LLM-based agent RL" |
| 🎓 Teach Video | teach_video.yaml (Please run command uv add manim before running this workflow) |
![]() Prompt: "讲一下什么是凸优化" |
For those implementations, you can use the Launch tab to execute them.
.csv for data analysis) if required.We welcome contributions from the community! Whether you're fixing bugs, adding new workflow templates, or sharing high-quality cases/artifacts produced by DevAll, your help is much appreciated. Feel free to contribute by submitting Issues or Pull Requests.
By contributing to DevAll, you'll be recognized in our Contributors list below. Check out our Developer Guide to get started!
![]() NA-Wen |
![]() zxrys |
![]() swugi |
![]() huatl98 |
![]() LaansDole |
![]() zivkovicp |
![]() shiowen |
![]() kilo2127 |
![]() AckerlyLau |
![]() rainoeelmae |
![]() conprour |
![]() Br1an67 |
![]() NINE-J |
![]() Yanghuabei |
@article{chatdev,
title = {ChatDev: Communicative Agents for Software Development},
author = {Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun},
journal = {arXiv preprint arXiv:2307.07924},
url = {https://arxiv.org/abs/2307.07924},
year = {2023}
}
@article{colearning,
title = {Experiential Co-Learning of Software-Developing Agents},
author = {Chen Qian and Yufan Dang and Jiahao Li and Wei Liu and Zihao Xie and Yifei Wang and Weize Chen and Cheng Yang and Xin Cong and Xiaoyin Che and Zhiyuan Liu and Maosong Sun},
journal = {arXiv preprint arXiv:2312.17025},
url = {https://arxiv.org/abs/2312.17025},
year = {2023}
}
@article{macnet,
title={Scaling Large-Language-Model-based Multi-Agent Collaboration},
author={Chen Qian and Zihao Xie and Yifei Wang and Wei Liu and Yufan Dang and Zhuoyun Du and Weize Chen and Cheng Yang and Zhiyuan Liu and Maosong Sun}
journal={arXiv preprint arXiv:2406.07155},
url = {https://arxiv.org/abs/2406.07155},
year={2024}
}
@article{iagents,
title={Autonomous Agents for Collaborative Task under Information Asymmetry},
author={Wei Liu and Chenxi Wang and Yifei Wang and Zihao Xie and Rennai Qiu and Yufan Dnag and Zhuoyun Du and Weize Chen and Cheng Yang and Chen Qian},
journal={arXiv preprint arXiv:2406.14928},
url = {https://arxiv.org/abs/2406.14928},
year={2024}
}
@article{puppeteer,
title={Multi-Agent Collaboration via Evolving Orchestration},
author={Yufan Dang and Chen Qian and Xueheng Luo and Jingru Fan and Zihao Xie and Ruijie Shi and Weize Chen and Cheng Yang and Xiaoyin Che and Ye Tian and Xuantang Xiong and Lei Han and Zhiyuan Liu and Maosong Sun},
journal={arXiv preprint arXiv:2505.19591},
url={https://arxiv.org/abs/2505.19591},
year={2025}
}
If you have any questions, feedback, or would like to get in touch, please feel free to reach out to us via email at qianc62@gmail.com