Introduce Our New Paper "OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use"

Hi everyone! :wave:

We’re excited to share our latest research: " OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use." This work delves into the rapidly evolving field of OS Agents——(M)LLM-based Agents using computing devices (e.g., computers and mobile phones) by operating within the environments and interfaces (e.g., Graphical User Interface (GUI)) provided by operating systems (OS) to automate tasks.

:page_facing_up: Link to Full Paper: OS-Agent-Survey/OS-Agent-Survey

:globe_with_meridians: Link to our Homepage: https://os-agent-survey.github.io/

Highlights from the Paper:

  • Foundational Insights: We define what constitutes OS Agents, exploring their core components (environment, observation space, and action space) and essential capabilities like understanding, planning, and grounding.

  • Construction Methodologies: Dive into the use of domain-specific foundation models, agent frameworks, and key techniques like supervised fine-tuning and memory mechanisms that empower these agents.

  • Evaluation Benchmarks: A review of protocols and metrics used to assess OS Agents and provide a comprehensive look at existing related benchmarks.

  • Challenges and Future Directions: From safety and privacy to personalization and self-evolution, we outline the critical challenges and opportunities ahead.

Join the Conversation:

We’ve created an open-source GitHub repository to support ongoing research and foster collaboration in this domain.
:speech_balloon: We’d love to hear your thoughts! What do you think about the future of OS Agents? Let’s discuss!

1 Like