Hi everyone!
We’re excited to share our latest research: " OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use." This work delves into the rapidly evolving field of OS Agents——(M)LLM-based Agents using computing devices (e.g., computers and mobile phones) by operating within the environments and interfaces (e.g., Graphical User Interface (GUI)) provided by operating systems (OS) to automate tasks.
Link to Full Paper: OS-Agent-Survey/OS-Agent-Survey
Link to our Homepage: https://os-agent-survey.github.io/
Highlights from the Paper:
-
Foundational Insights: We define what constitutes OS Agents, exploring their core components (environment, observation space, and action space) and essential capabilities like understanding, planning, and grounding.
-
Construction Methodologies: Dive into the use of domain-specific foundation models, agent frameworks, and key techniques like supervised fine-tuning and memory mechanisms that empower these agents.
-
Evaluation Benchmarks: A review of protocols and metrics used to assess OS Agents and provide a comprehensive look at existing related benchmarks.
-
Challenges and Future Directions: From safety and privacy to personalization and self-evolution, we outline the critical challenges and opportunities ahead.
Join the Conversation:
We’ve created an open-source GitHub repository to support ongoing research and foster collaboration in this domain.
We’d love to hear your thoughts! What do you think about the future of OS Agents? Let’s discuss!