Filtrix.ai – AI Image Generation
Transform text into stunning visuals with our advanced AI technology. Perfect for content creators who need high-quality images without the hassle.
✓ Instant results ✓ Multiple styles ✓ High resolution ✓ Commercial license
Introduction
At this year’s Sequoia Capital AI Ascent 2024 conference, renowned artificial intelligence expert Andrew Ng was invited to give a speech. During his speech, he introduced an interesting new concept – “Agentic Workflow.”
After watching the speech, I immediately put this method into practice and found it to be extremely effective. Below, I will provide a comprehensive introduction to the concept of Agentic Workflow and share my own experiences and thoughts during the practical application process.
I also generated a podcast episode using NotebookLM. Click below if you want to listen.
What is Agentic Workflow?

Agentic Workflow is a new way of interacting with and completing tasks using large language models (LLMs). Traditionally, when we interact with LLMs, we input a prompt, and the LLM then generates an output based on this prompt. This approach is like asking someone to write an article from start to finish in one go, without the opportunity for repeated modifications and iterations.
Agentic Workflow, on the other hand, is more like breaking down the writing process into multiple steps: first, writing a draft based on a topic outline, then analyzing, modifying, and supplementing the draft, followed by further refinement and polishing, and so on, iterating until the desired result is produced.
In this process, instead of directly instructing the LLM to “write an article,” we break down the task into multiple sub-tasks and guide the LLM to complete each sub-task step by step. The output of each sub-task serves as the input for the next step, cyclically.
In summary, Agentic Workflow is an iterative and collaborative model that transforms the interaction with LLMs into a series of manageable, refinable steps, allowing for continuous improvement and adaptation throughout the task-completion process.
Case Study
Andrew Ng cited a practical case in his speech to demonstrate the performance of Agentic Workflow in coding tasks. His team used a coding benchmark dataset called HumanEval to test the difference in results between the traditional “zero-shot prompting” method and the Agentic Workflow method in solving code problems.
The task was: “Given a non-empty list of integers, return the sum of all even-positioned elements.” Using the zero-shot prompting method, i.e., directly asking the AI to generate the solution code, GPT-3.5’s accuracy was only 48%, and GPT-4’s accuracy was 67%, both of which were relatively average.

However, when using the Agentic Workflow, breaking down the task into multiple steps such as problem analysis, iterative code writing, testing, and debugging, GPT-3.5’s performance even surpassed the accuracy of GPT-4 using zero-shot prompting. Ng pointed out that applying the Agentic Workflow on the GPT-4 model also achieved very impressive results.
This case vividly demonstrates that even lower versions of large language models (such as GPT-3.5) can achieve superior performance in solving complex problems by breaking down tasks into multiple steps and repeatedly iterating and optimizing, surpassing the performance of a one-time direct output generation.
Agentic Reasoning Design Patterns

Andrew then presented four common agentic design patterns:
Reflection, Tool use, Planning, and Multiagent collaboration.
1. Reflection

This pattern involves an AI system improving its capabilities through self-feedback and iterative refinement. In this way, the AI system can enhance the quality and accuracy of its output after generating an initial solution by further reflection and analysis. This method can be applied not only to programming tasks but also to other domains such as writing, design, or any task that requires iterative improvement.
Relevant papers include:
- “Self-Refine: Iterative Refinement with Self-Feedback” by Madaan et al. (2023)
- This paper introduces the SELF-REFINE method, a feedback- and iteration-based approach to improve the output quality of large language models (LLMs). By using feedback and iteration, SELF-REFINE ensures that the output meets the desired quality without requiring human assistance. Experiments demonstrate significant performance improvements in various tasks, especially in self-feedback and iterative refinement.
- “Reflexion: Language Agents with Verbal Reinforcement Learning” by Shinn et al. (2023)
- This paper presents the Reflexion method, a language feedback-based approach for reinforcing language agents. Reflexion agents provide better decisions by maintaining their reflective texts in an event memory buffer. Reflexion can accept various types and sources of feedback signals and shows significant improvements in multiple tasks.
Through these methods, language models can become more adaptive and flexible, better meeting users’ needs. In practice, this is also a common approach used in real-world applications, where multiple rounds of dialogue and partial corrections help AI provide more satisfactory answers.
2. Tool Use

The tool use paradigm originated from early explorations in the computer vision field. Since language models at the time could not handle images, the only option was to generate functions that could call visual APIs, such as image generation, object detection, etc. With the emergence of multimodal language models like GPT, the concept of tool use was popularized, and language models were no longer seen as isolated systems but as intelligent agents connected to external tools and knowledge bases.
Relevant papers include:
- “Gorilla: Large Language Model Connected with Massive APIs” by Patil et al. (2023)
- This paper proposes Gorilla, a large language model capable of effectively utilizing API calls. Gorilla surpasses GPT-4 in generating accurate API calls by addressing input parameter generation and hallucination issues. When combined with a document retriever, Gorilla can adapt to changes in documents at test time and mitigate hallucination issues. The integration of the model with the retrieval system improves reliability. Gorilla will be open-sourced on July 4, 2023.
- “MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action” by Yang et al. (2023)
- This paper proposes MM-REACT, a prompting-based approach that enables ChatGPT to perform multimodal reasoning and action. The authors designed a prompting template that combines natural language instructions with multimodal inputs such as images and web pages, allowing ChatGPT to understand and execute complex tasks involving multiple modalities. They evaluated MM-REACT on several benchmarks, showing its excellent performance in multimodal reasoning and action. This work extends the application of large language models in the multimodal domain.
Through tool use, language models can perform a variety of tasks such as web search, code generation, and personal productivity, greatly surpassing their original natural language processing capabilities. In the future, tool use may become an important direction for the development of language models, endowing them with stronger planning, reasoning, and action abilities.
3. Planning

Planning refers to training language models to reason, devise, and decompose complex tasks. This enables language models to not only answer questions but also proactively develop and execute action plans. With planning capabilities, language models can autonomously break down tasks, identify the necessary substeps and tools, and coordinate the invocation of different models. In one scenario mentioned by Andrew, the language model needs to first detect the posture of a person in an image, then call an image generation model to synthesize a new image, and finally combine it with voice synthesis to output the result.
Relevant papers include:
- “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” by Wei et al. (2022)
- This paper introduces a new method called “Chain-of-Thought Prompting,” aimed at eliciting step-by-step reasoning in large language models. Through specific prompting forms, language models can be guided to decompose complex problems and gradually derive solutions. Experimental results show that chain-of-thought prompting significantly improves language models’ performance in tasks such as arithmetic reasoning, common sense reasoning, and symbolic reasoning.
- “HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face” by Shen et al. (2023)
- This work combines ChatGPT with several expert models in the Hugging Face ecosystem to build a multimodal AI system called HuggingGPT. HuggingGPT leverages ChatGPT’s natural language understanding and generation capabilities to coordinate the invocation of expert models in computer vision, speech recognition, etc., completing complex multimodal tasks. The authors designed a prompting engineering method that allows ChatGPT to autonomously decide which expert models to call based on task requirements.
Planning endows language models with a “tool use” meta-capability, allowing them to transcend specific domains and flexibly combine different expert models to complete various complex tasks.
4. Multiagent Collaboration

Multiagent collaboration refers to having multiple language models or agents collaborate through interaction to complete complex tasks. For example, experts representing different roles (such as doctors, nurses, etc.) can be simulated to jointly develop diagnostic and treatment plans. The key to this pattern is training agents to collaborate efficiently, with a clear division of labor, to avoid conflicts and contradictions.
Relevant papers include:
- “Communicative Agents for Software Development” by Qian et al. (2023)
- This paper presents the ChatDev system, a multiagent collaboration framework based on large language models for software development tasks. ChatDev prompts language models to play different roles (such as CEO, designer, developer, etc.) and simulates the collaborative process of a software development team. These virtual agents can engage in ongoing dialogue about product requirements, design, coding, etc., and ultimately generate relatively complete programs. Experiments show that ChatDev performs well on multiple programming tasks, demonstrating the potential of multiagent collaboration.
- “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation” by Wu et al. (2023)
- This work proposes AutoGen, a framework for accomplishing complex tasks through internal multiagent dialogue. AutoGen combines multiple language models into a multiagent system that collaborates to complete the entire process from requirement analysis to code generation. Different agents are responsible for different subtasks, such as requirement understanding, architecture design, code implementation, etc., and coordinate through natural language dialogue. Experimental results show that AutoGen performs better than a single language model on tasks such as code generation, proving the effectiveness of the multiagent approach.
In the future, multiagent systems may become a powerful tool for solving complex problems, demonstrating a collaborative ability that surpasses that of single agents.
Conclusion

By appropriately utilizing agentic reasoning design patterns such as “Reflection,” “Tool Use,” “Planning,” and “Multiagent Collaboration,” Andrew Ng predicts that AI systems will experience explosive growth in capabilities this year, advancing into unprecedented complex application domains. However, this new agentic workflow is fundamentally different from traditional models, requiring more patience as agents gradually analyze, plan, and iterate in the background, often taking minutes or even hours to yield final results, in stark contrast to the human pursuit of instant feedback.
Another trend worth noting is the demand for rapid token generation capabilities. In agentic workflows, iterative iterations are the norm, and models that generate tokens quickly, even if of slightly lower quality, may eventually surpass models that are of higher quality but slower to generate through multiple rounds of optimization. Agentic reasoning workflows open up new horizons for AI development, awakening AI’s potential through structured interaction and showing infinite possibilities.
My Own Reflections and Practices
When I started using tools like GPT-3 in 2023, I was always in a mode similar to zero-shot prompting, directly asking the large model for what I needed without any prompts. For example, translating articles or modifying code. However, in practice, I gradually learned to provide the large model with relevant context and prompts and to get satisfactory results through repeated dialogue. However, these were limited to dialogues with a single model, not the multi-agent collaboration mentioned above.
In the past, when writing articles, I would search extensively online for suitable materials and videos based on keywords. After finding the materials, I would read, summarize, and analyze them extensively. Then, I would combine these materials with my own practice and case studies, translate them into English, and optimize my articles using SEO tools. I would also commission a designer to create appropriate illustrations for my articles. These steps often consumed a lot of my time and energy because interpreting texts and videos required time to read.
Practices
Therefore, I designed an agentic workflow based on my usual work process:

- Search and Summarization:
- Use Perplexity to search relevant websites and papers and automatically extract key information and points.
- Use GPTs to automatically summarize and condense YouTube video content.
- Article Structuring:
- Use AI tools to organize the summarized content and automatically generate article titles and structures.
- Further, organize and optimize the content based on the generated structure.
- Translation and Optimization:
- Use GPTs to translate the article from Chinese to English, ensuring accuracy and fluency.
- Use SEO optimization tools to enhance the English article with keyword optimization and formatting adjustments for better search engine rankings.
- Image Generation:
- Use GPTs to generate prompts for images.
- Use Midjourney to create images related to the article’s theme, enhancing its visual appeal.
I provided ample settings and context for these four stages through prompts, significantly improving my work efficiency. In fact, this article was completed through this multi-agent collaborative workflow.
Reflections
When setting up each agent in the workflow, I realized that a one-shot prompt mode might also be inefficient for human individuals. As Andrew mentioned in his talk, if we give a programmer a piece of code but don’t tell them the context and language, humans would also be at a loss. In other words, when we create agents, we are also, to some extent, creating ourselves — “We shape our tools, and thereafter our tools shape us.”
As I assigned tasks to each agent and made them collaborate, I wondered if I was also shaping these different agents based on my understanding of an editorial team. In other words, when the prompts are precise enough and the division of labor is detailed enough, what difference remains between us and them?
In today’s era, people question the accuracy of AI production tools, but sometimes they might overlook a fact: as humans, we also make many mistakes. Generative content is inherently based on probabilities from countless training corpora, but have people ever considered that electron clouds are also described by probability wave functions?
In quantum mechanics, the position and movement of electrons cannot be precisely determined, but are instead described by probability wave functions. This means we can only know the probability of an electron appearing at a certain position, rather than its exact location. Just as neural networks use probability distributions to represent features or categories of data, what we consider the “real” world is also composed of probabilities at the microscopic level.
Similar to how we cannot fully determine the position of an electron, AI cannot guarantee 100% accuracy. However, both are capable of describing or simulating reality to a certain extent.
Perhaps, as Andrew summarized in his talk, the road to Artificial General Intelligence (AGI) will be a journey, not just a destination. On this journey, I look forward to the disruptive applications these methods will bring in the future and how they will change our world.
References
Sequoia Capital. (2023). AI Ascent 2024. YouTube video
Madaan, A., et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback.
Shinn, S., et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning.
Patil, P., et al. (2023). Gorilla: Large Language Model Connected with Massive APIs.
Yang, Y., et al. (2023). MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action.
Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
Shen, S., et al. (2023). HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face.
Qian, Q., et al. (2023). Communicative Agents for Software Development.
Wu, W., et al. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation.
