# AI Completion
Source: https://docs.orgo.ai/api-reference/ai/completion

POST /ai
Access 400+ AI models through OpenRouter integration. Requires OpenRouter to be connected in account settings.


# List Available Models
Source: https://docs.orgo.ai/api-reference/ai/list-models

GET /ai
Get list of all available AI models from OpenRouter


# Authentication
Source: https://docs.orgo.ai/api-reference/authentication

API key setup

## Overview

All API requests require a Bearer token in the `Authorization` header.

## Get Your API Key

1. Go to [orgo.ai/projects](https://www.orgo.ai/projects)
2. Click "Generate API Key"
3. Copy your key (format: `sk_live_...`)

**Important**: Store your API key securely. Don't commit it to version control.

## Usage

Include the header in every request:

```bash  theme={null}
Authorization: Bearer sk_live_your_api_key_here
```

## Examples

### cURL

```bash  theme={null}
curl https://www.orgo.ai/api/projects \
  -H "Authorization: Bearer sk_live_abc123..."
```

### Python

```python  theme={null}
import requests

headers = {
    "Authorization": "Bearer sk_live_abc123...",
    "Content-Type": "application/json"
}

response = requests.get(
    "https://www.orgo.ai/api/projects",
    headers=headers
)
```

### JavaScript

```javascript  theme={null}
fetch('https://www.orgo.ai/api/projects', {
  headers: {
    'Authorization': 'Bearer sk_live_abc123...',
    'Content-Type': 'application/json'
  }
})
```

## Environment Variables

Store your key as an environment variable:

```bash  theme={null}
export ORGO_API_KEY=sk_live_abc123...
```

Then reference it in your code:

```python  theme={null}
import os

api_key = os.environ.get("ORGO_API_KEY")
```

## Error Responses

**Invalid key:**

```json  theme={null}
{
  "error": "Invalid API key"
}
```

**Missing key:**

```json  theme={null}
{
  "error": "Authentication failed"
}
```

Both return `401 Unauthorized`.

## Security

* Keep your API key private
* Rotate keys if compromised
* Use environment variables, not hardcoded values
* Don't share keys in public repositories

## Need Help?

Contact [support](mailto:spencer@orgo.ai) if you lose access to your API key.


# Execute Bash Command
Source: https://docs.orgo.ai/api-reference/computers/bash

POST /computers/{id}/bash


# Click Mouse
Source: https://docs.orgo.ai/api-reference/computers/click

POST /computers/{id}/click


# Create Computer
Source: https://docs.orgo.ai/api-reference/computers/create

POST /projects/{project_name}/computers

Create a new computer within a project. The computer name must be unique within the project.

## Example

```json  theme={null}
{
  "name": "dev-machine",
  "os": "linux",
  "ram": 4,
  "cpu": 2
}
```


# Delete Computer
Source: https://docs.orgo.ai/api-reference/computers/delete

DELETE /computers/{id}

Permanently delete a computer. This action cannot be undone.

## Behavior

* Computer will be stopped if currently running
* All data on the computer will be lost
* Returns 200 status code on successful deletion

## Example

```bash  theme={null}
curl -X DELETE https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer sk_live_..."
```


# Mouse Drag
Source: https://docs.orgo.ai/api-reference/computers/drag

POST /computers/{id}/drag


# Execute Python Code
Source: https://docs.orgo.ai/api-reference/computers/exec

POST /computers/{id}/exec


# Get Computer
Source: https://docs.orgo.ai/api-reference/computers/get

GET /computers/{id}

Retrieve details about a specific computer by its ID.

## Example Response

```json  theme={null}
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "dev-machine",
  "project_name": "my-project",
  "os": "linux",
  "ram": 4,
  "cpu": 2,
  "status": "running",
  "url": "https://dev-machine.example.com",
  "created_at": "2024-01-15T10:30:00Z"
}
```


# Press Key
Source: https://docs.orgo.ai/api-reference/computers/key

POST /computers/{id}/key


# List Computers
Source: https://docs.orgo.ai/api-reference/computers/list

GET /projects/{project_name}/computers

Get all computers within a project.

## Example Response

```json  theme={null}
{
  "computers": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "dev-machine",
      "project_name": "my-project",
      "os": "linux",
      "ram": 4,
      "cpu": 2,
      "status": "running",
      "url": "https://dev-machine.example.com",
      "created_at": "2024-01-15T10:30:00Z"
    }
  ]
}
```


# Restart Computer
Source: https://docs.orgo.ai/api-reference/computers/restart

POST /computers/{id}/restart

Restart a computer. Performs a graceful shutdown followed by a fresh start.

## Use Cases

* Recovering from a hung or unresponsive state
* Applying system updates that require a reboot
* Resetting the environment to a clean state

## Example

```bash  theme={null}
curl -X POST https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000/restart \
  -H "Authorization: Bearer sk_live_..."
```


# Take Screenshot
Source: https://docs.orgo.ai/api-reference/computers/screenshot

GET /computers/{id}/screenshot


# Scroll Page
Source: https://docs.orgo.ai/api-reference/computers/scroll

POST /computers/{id}/scroll


# Start Computer
Source: https://docs.orgo.ai/api-reference/computers/start

POST /computers/{id}/start

Start a stopped computer.

## Behavior

* Idempotent operation - succeeds if computer is already running
* Computer becomes accessible within moments

## Example

```bash  theme={null}
curl -X POST https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000/start \
  -H "Authorization: Bearer sk_live_..."
```


# Stop Computer
Source: https://docs.orgo.ai/api-reference/computers/stop

POST /computers/{id}/stop

Stop a running computer to save costs when not in use.

## Behavior

* Computer is gracefully shut down
* Idempotent operation - succeeds if already stopped
* Stopped computers do not incur compute charges

## Example

```bash  theme={null}
curl -X POST https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000/stop \
  -H "Authorization: Bearer sk_live_..."
```


# Start Stream
Source: https://docs.orgo.ai/api-reference/computers/stream-start

POST /computers/{id}/stream/start

## Description

Start streaming the computer's display to an RTMP server. This allows you to spectate your agent's computer in real-time through platforms like Twitch, YouTube Live, or custom RTMP servers.

## Prerequisites

Before using this endpoint, you must:

1. Configure an RTMP connection in your [account settings](https://www.orgo.ai/settings)
2. Provide the connection name when starting the stream

## Usage Example

```python  theme={null}
# Start streaming to a configured connection
result = computer.start_stream("my-twitch-1")

# The computer's display is now being streamed
# Do your automation/demo
computer.type("Hello viewers!")
computer.bash("ls -la")

# Stop streaming when done
computer.stop_stream()
```

## Connection Configuration

RTMP connections are configured in your account settings with:

* A unique name (used in this API call)
* RTMP server URL
* Stream key (encrypted and stored securely)
* Optional settings (bitrate, resolution, etc.)

## Response

The response includes information about the streaming process:

```json  theme={null}
{
  "success": true,
  "status": "streaming",
  "pid": 12345,
  "start_time": "2024-01-20T10:30:00Z"
}
```

## Common Use Cases

* Live demonstrations of AI agents
* Recording automation workflows
* Debugging and monitoring agent behavior
* Creating content for tutorials or showcases


# Get Stream Status
Source: https://docs.orgo.ai/api-reference/computers/stream-status

GET /computers/{id}/stream/status

## Description

Check the current streaming status of a computer. This endpoint allows you to verify if a stream is active, when it started, and get the process information.

## Usage Example

```python  theme={null}
# Check if streaming is active
status = computer.stream_status()

if status['status'] == 'streaming':
    print(f"Stream active since: {status['start_time']}")
    print(f"Process ID: {status['pid']}")
elif status['status'] == 'idle':
    print("No active stream")
```

## Response Format

### When Streaming

```json  theme={null}
{
  "status": "streaming",
  "start_time": "2024-01-20T10:30:00Z",
  "pid": 12345
}
```

### When Idle

```json  theme={null}
{
  "status": "idle"
}
```

### When Terminated

```json  theme={null}
{
  "status": "terminated",
  "message": "Stream process was terminated unexpectedly"
}
```

## Status Values

* `idle` - No active stream
* `streaming` - Stream is currently active
* `terminated` - Stream process ended unexpectedly

## Common Use Cases

* Monitoring stream health
* Verifying stream started successfully
* Detecting unexpected stream termination
* Building stream status dashboards


# Stop Stream
Source: https://docs.orgo.ai/api-reference/computers/stream-stop

POST /computers/{id}/stream/stop

## Description

Stop an active stream on the computer. This gracefully terminates the streaming process and releases resources.

## Usage Example

```python  theme={null}
# Stop the active stream
result = computer.stop_stream()

if result['success']:
    print("Stream stopped successfully")
```

## Response Format

```json  theme={null}
{
  "success": true,
  "message": "Stream stopped successfully"
}
```

## Error Handling

If no stream is active, the endpoint will return an appropriate message:

```json  theme={null}
{
  "success": false,
  "error": "No active stream to stop"
}
```

## Best Practices

1. Always stop streams when done to free resources
2. Check stream status before stopping if unsure
3. Handle cases where stream might have already terminated

## Example Workflow

```python  theme={null}
# Complete streaming workflow
try:
    # Start streaming
    computer.start_stream("my-connection")
    
    # Perform your automation
    computer.type("Running automated demo...")
    computer.bash("python my_script.py")
    
    # Always stop the stream
    computer.stop_stream()
    
except Exception as e:
    # Ensure stream is stopped even on error
    computer.stop_stream()
    raise e
```


# Type Text
Source: https://docs.orgo.ai/api-reference/computers/type

POST /computers/{id}/type


# Wait Duration
Source: https://docs.orgo.ai/api-reference/computers/wait

POST /computers/{id}/wait


# Delete File
Source: https://docs.orgo.ai/api-reference/files/delete

DELETE /files/{id}
Delete a file from storage

Delete a file from storage. This removes the file from cloud storage and the database.


# Download File
Source: https://docs.orgo.ai/api-reference/files/download

GET /files/{id}/download
Get a signed download URL for a file

Get a signed download URL for a file. The URL expires after 1 hour.

## Example

```bash  theme={null}
curl https://www.orgo.ai/api/files/{id}/download \
  -H "Authorization: Bearer sk_live_..."
```

### Response

```json  theme={null}
{
  "url": "https://signed-url-here..."
}
```

Then open the URL in a browser or use it to download the file.


# Export File
Source: https://docs.orgo.ai/api-reference/files/export

POST /files/export
Export a file from a computer's filesystem. Returns a download URL for the file.

Export a file from a computer's filesystem. This allows you to pull files created inside the VM (like results, screenshots, or generated content) and get a download URL.

<Note>
  The computer must be in a running state to export files.
</Note>

## Path Formats

The path parameter accepts several formats:

| Format           | Example                          |
| ---------------- | -------------------------------- |
| Relative to home | `Desktop/results.txt`            |
| Absolute path    | `/home/user/Desktop/results.txt` |
| With tilde       | `~/Desktop/results.txt`          |

## Security

Files can only be exported from within `/home/user`. Attempting to access paths outside this directory will return a 403 error.

## Example

```bash  theme={null}
curl -X POST https://www.orgo.ai/api/files/export \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"desktopId": "8823f0ff-f4bc-4ab2-833e-40d82c10b505", "path": "Desktop/results.txt"}'
```

### Response

```json  theme={null}
{
  "success": true,
  "file": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "filename": "results.txt",
    "size_bytes": 1024,
    "content_type": "text/plain",
    "created_at": "2024-01-15T10:30:00Z",
    "desktop_id": "8823f0ff-f4bc-4ab2-833e-40d82c10b505",
    "project_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  },
  "url": "https://fly.storage.tigris.dev/bucket/..."
}
```


# List Files
Source: https://docs.orgo.ai/api-reference/files/list

GET /computers/{id}/files
List all files associated with a computer

List all files associated with a computer, including both uploaded files and exported files.


# Upload File
Source: https://docs.orgo.ai/api-reference/files/upload

POST /computers/{id}/files/upload
Upload a file to a computer's Desktop folder. The file will be synced to all running computers in the project.

Upload a file to a computer's Desktop folder. The file will automatically sync to all running computers in the project.

## Supported Files

* Maximum file size: 10MB
* All file types supported

## Example

```bash  theme={null}
curl -X POST https://www.orgo.ai/api/computers/{id}/files/upload \
  -H "Authorization: Bearer sk_live_..." \
  -F "file=@./document.pdf"
```


# Introduction
Source: https://docs.orgo.ai/api-reference/introduction

Build with virtual computers programmatically

## Overview

The Orgo API lets you create projects, provision virtual computers, and control them programmatically. Build AI agent fleets, automation workflows, or browser testing at scale.

## Authentication

All requests require a Bearer token:

```bash  theme={null}
Authorization: Bearer your_api_key
```

Get your API key at [orgo.ai/projects](https://www.orgo.ai/projects).

## Base URL

```
https://www.orgo.ai/api
```

## Quick Start

### 1. Create a Project

Projects are containers for computers.

```bash  theme={null}
curl -X POST https://www.orgo.ai/api/projects \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"name": "manus"}'
```

### 2. Create a Computer

Add a computer to your project:

```bash  theme={null}
curl -X POST https://www.orgo.ai/api/projects/manus/computers \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "agent-1",
    "os": "linux",
    "ram": 2,
    "cpu": 2
  }'
```

### 3. Control the Computer

```bash  theme={null}
# Screenshot
curl https://www.orgo.ai/api/projects/manus/computers/agent-1/screenshot \
  -H "Authorization: Bearer your_api_key"

# Click
curl -X POST https://www.orgo.ai/api/projects/manus/computers/agent-1/click \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"x": 100, "y": 200}'

# Type
curl -X POST https://www.orgo.ai/api/projects/manus/computers/agent-1/type \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'
```

## Resource Hierarchy

```
User
└── Projects (e.g., "manus")
    └── Computers (e.g., "agent-1", "agent-2")
```

Projects organize your computers. Free tier: 2 concurrent computers.

## Computer Specs

* **OS**: Linux or Windows
* **RAM**: 2GB, 4GB, or 8GB
* **CPU**: 2, 4, or 8 cores
* **GPU**: None, T4, or A10 (coming soon)

## Control Operations

### Mouse

* Click (left, right, double)
* Drag
* Scroll

### Keyboard

* Type text
* Press keys (Enter, Tab, ctrl+c, etc.)

### Execution

* Bash commands
* Python code

### Other

* Screenshots
* Wait/delays
* Streaming (RTMP)

## Error Responses

```json  theme={null}
{
  "error": "Error message"
}
```

**Status codes:**

* `200` - Success
* `400` - Invalid request
* `401` - Invalid API key
* `404` - Resource not found
* `500` - Server error

## Next Steps

<CardGroup cols={2}>
  <Card title="API Reference" icon="code" href="/api-reference/projects/create">
    All endpoints
  </Card>

  <Card title="Python SDK" icon="book" href="/quickstart">
    Get started fast
  </Card>

  <Card title="Claude Integration" icon="robot" href="/guides/claude-computer-use">
    AI agent guide
  </Card>

  <Card title="Authentication" icon="key" href="/api-reference/authentication">
    Setup
  </Card>
</CardGroup>


# Create Project
Source: https://docs.orgo.ai/api-reference/projects/create

POST /projects
Create a new named project


# Delete Project
Source: https://docs.orgo.ai/api-reference/projects/delete

POST /projects/{id}/delete
Delete project and all its computers


# Get Project by Project ID
Source: https://docs.orgo.ai/api-reference/projects/get-by-name

GET /projects/by-name/{name}


# List Projects
Source: https://docs.orgo.ai/api-reference/projects/list

GET /projects
List all projects for authenticated user


# Restart Project
Source: https://docs.orgo.ai/api-reference/projects/restart

POST /projects/{id}/restart


# Start Project
Source: https://docs.orgo.ai/api-reference/projects/start

POST /projects/{id}/start


# Stop Project
Source: https://docs.orgo.ai/api-reference/projects/stop

POST /projects/{id}/stop


# Agent S2
Source: https://docs.orgo.ai/guides/agent-s2

Let Agent S2 control a virtual desktop

## Overview

This guide walks through setting up Agent S2, the open-source SOTA computer use agent by Simular AI. These steps include trying it locally on your own computer or on a virtual desktop through Orgo.

## Setup

Install the required packages:

<CodeGroup>
  ```bash pip theme={null}
  pip install gui-agents pyautogui python-dotenv orgo
  ```

  ```bash requirements.txt theme={null}
  gui-agents
  pyautogui
  python-dotenv
  orgo
  pillow
  ```
</CodeGroup>

Set up your API keys:

<CodeGroup>
  ```bash terminal icon="terminal" theme={null}
  # Export as environment variables
  export OPENAI_API_KEY=your_openai_api_key
  export ANTHROPIC_API_KEY=your_anthropic_api_key
  export ORGO_API_KEY=your_orgo_api_key  # Optional for remote
  ```

  ```python setup.py icon="python" theme={null}
  import os
  os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
  os.environ["ANTHROPIC_API_KEY"] = "your_anthropic_api_key"
  os.environ["ORGO_API_KEY"] = "your_orgo_api_key"  # Optional
  ```

  ```bash .env icon="file" theme={null}
  OPENAI_API_KEY=your_openai_api_key
  ANTHROPIC_API_KEY=your_anthropic_api_key

  # Optional for remote execution
  ORGO_API_KEY=your_orgo_api_key
  USE_CLOUD_ENVIRONMENT=false
  ```
</CodeGroup>

## Simple Usage

Run Agent S2 with natural language commands:

<CodeGroup>
  ```bash local icon="terminal" theme={null}
  # Local mode - controls your computer
  python agent_s2.py "Open Chrome and search for weather"
  ```

  ```bash remote icon="terminal" theme={null}
  # Remote mode - controls cloud desktop via Orgo
  USE_CLOUD_ENVIRONMENT=true python agent_s2.py "Open Chrome"
  ```

  ```bash interactive icon="terminal" theme={null}
  # Interactive mode
  python agent_s2.py
  ```
</CodeGroup>

This approach uses Agent S2's compositional framework to execute complex computer use tasks.

## Complete Example

<CodeGroup>
  ```python agent_s2.py expandable icon="python" theme={null}
  #!/usr/bin/env python3

  import os
  import io
  import sys
  import time
  from dotenv import load_dotenv
  from gui_agents.s2.agents.agent_s import AgentS2
  from gui_agents.s2.agents.grounding import OSWorldACI
  from orgo import Computer
  import pyautogui

  load_dotenv()

  CONFIG = {
      "model": os.getenv("AGENT_MODEL", "gpt-4o"),
      "model_type": os.getenv("AGENT_MODEL_TYPE", "openai"),
      "grounding_model": os.getenv("GROUNDING_MODEL", "claude-3-7-sonnet-20250219"),
      "grounding_type": os.getenv("GROUNDING_MODEL_TYPE", "anthropic"),
      "max_steps": int(os.getenv("MAX_STEPS", "10")),
      "step_delay": float(os.getenv("STEP_DELAY", "0.5")),
      "remote": os.getenv("USE_CLOUD_ENVIRONMENT", "false").lower() == "true"
  }


  class LocalExecutor:
      def __init__(self):
          self.pyautogui = pyautogui
          if sys.platform == "win32":
              self.platform = "windows"
          elif sys.platform == "darwin":
              self.platform = "darwin"
          else:
              self.platform = "linux"
      
      def screenshot(self):
          img = self.pyautogui.screenshot()
          buffer = io.BytesIO()
          img.save(buffer, format="PNG")
          buffer.seek(0)
          return buffer.getvalue()
      
      def exec(self, code):
          exec(code, {"pyautogui": self.pyautogui, "time": time})
      
      def destroy(self):
          # No cleanup needed for local executor
          pass


  class RemoteExecutor:
      def __init__(self):
          self.computer = Computer()
          self.platform = "linux"
      
      def screenshot(self):
          return self.computer.screenshot_base64()
      
      def exec(self, code):
          result = self.computer.exec(code)
          if not result['success']:
              raise Exception(result.get('error', 'Execution failed'))
          if result['output']:
              print(f"Output: {result['output']}")
      
      def destroy(self):
          self.computer.destroy()


  def create_agent(executor):
      engine_params = {"engine_type": CONFIG["model_type"], "model": CONFIG["model"]}
      grounding_params = {"engine_type": CONFIG["grounding_type"], "model": CONFIG["grounding_model"]}
      
      grounding_agent = OSWorldACI(
          platform=executor.platform,
          engine_params_for_generation=engine_params,
          engine_params_for_grounding=grounding_params
      )
      
      return AgentS2(
          engine_params=engine_params,
          grounding_agent=grounding_agent,
          platform=executor.platform,
          action_space="pyautogui",
          observation_type="screenshot"
      )


  def run_task(agent, executor, instruction):
      print(f"\n🤖 Task: {instruction}")
      print(f"📍 Mode: {'Remote' if CONFIG['remote'] else 'Local'}\n")
      
      for step in range(CONFIG["max_steps"]):
          print(f"Step {step + 1}/{CONFIG['max_steps']}")
          
          obs = {"screenshot": executor.screenshot()}
          info, action = agent.predict(instruction=instruction, observation=obs)
          
          if info:
              print(f"💭 {info}")
          
          if not action or not action[0]:
              print("✅ Complete")
              return True
          
          try:
              print(f"🔧 {action[0]}")
              executor.exec(action[0])
          except Exception as e:
              print(f"❌ Error: {e}")
              instruction = "The previous action failed. Try a different approach."
          
          time.sleep(CONFIG["step_delay"])
      
      print("⏱️ Max steps reached")
      return False


  def main():
      executor = RemoteExecutor() if CONFIG["remote"] else LocalExecutor()
      try:
          agent = create_agent(executor)
          
          if len(sys.argv) > 1:
              run_task(agent, executor, " ".join(sys.argv[1:]))
          else:
              print("🎮 Interactive Mode (type 'exit' to quit)\n")
              while True:
                  task = input("Task: ").strip()
                  if task == "exit":
                      break
                  elif task:
                      run_task(agent, executor, task)
      finally:
          # Clean up
          executor.destroy()


  if __name__ == "__main__":
      main()
  ```
</CodeGroup>

## Platform Requirements

### macOS

Grant Terminal access: System Settings → Privacy & Security → Accessibility

### Windows

May require running Terminal as Administrator

### Linux

Install dependencies:

```bash icon="terminal" theme={null}
sudo apt-get install python3-tk python3-dev
```

## Environment Variables

| Variable                | Default                      | Description                        |
| ----------------------- | ---------------------------- | ---------------------------------- |
| `OPENAI_API_KEY`        | -                            | OpenAI API key                     |
| `ANTHROPIC_API_KEY`     | -                            | Anthropic API key                  |
| `ORGO_API_KEY`          | -                            | Orgo API key (remote mode)         |
| `USE_CLOUD_ENVIRONMENT` | `false`                      | Set to `true` for remote execution |
| `AGENT_MODEL`           | `gpt-4o`                     | Main reasoning model               |
| `GROUNDING_MODEL`       | `claude-3-7-sonnet-20250219` | Visual grounding model             |
| `MAX_STEPS`             | `10`                         | Maximum steps per task             |
| `STEP_DELAY`            | `0.5`                        | Seconds between actions            |

## Architecture

Agent S2 uses a compositional framework with specialized modules:

**Mixture of Grounding** - Routes actions to specialized visual grounding models for precise UI localization

**Proactive Hierarchical Planning** - Dynamically refines plans based on evolving observations

**Cross-platform Support** - Works on macOS, Windows, and Linux

## Performance

Agent S2 achieves state-of-the-art results on computer use benchmarks:

| Benchmark         | Success Rate | Rank |
| ----------------- | ------------ | ---- |
| OSWorld           | 27.0%        | #3   |
| WindowsAgentArena | 29.8%        | #1   |
| AndroidWorld      | 54.3%        | #1   |

## Resources

* [GitHub Repository](https://github.com/simular-ai/Agent-S)
* [Agent S2 Whitepaper](https://arxiv.org/abs/2504.00906)
* [OSWorld Benchmark](https://os-world.github.io/)

<Note>
  Agent S2 is currently ranked #3 on the OSWorld benchmark, demonstrating leading performance on complex computer use tasks.
</Note>

## Video Tutorial

Here is a video version of this guide:

<iframe width="100%" height="400" src="https://www.youtube.com/embed/GgUC4q7MTaw" title="Agent S2 Setup Tutorial" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

You can follow the video tutorial above or use this written guide.


# Claude Computer Use
Source: https://docs.orgo.ai/guides/claude-computer-use

Let Claude control a virtual desktop

## Overview

This guide shows how to get started with Anthropic's Claude Computer Use in a couple minutes using Orgo to control a virtual desktop environment.

## Setup

Install the required packages:

<CodeGroup>
  ```bash pip theme={null}
  pip install orgo anthropic
  ```

  ```bash npm theme={null}
  npm install orgo @anthropic-ai/sdk
  ```

  ```bash yarn theme={null}
  yarn add orgo @anthropic-ai/sdk
  ```

  ```bash pnpm theme={null}
  pnpm add orgo @anthropic-ai/sdk
  ```
</CodeGroup>

Set up your API keys:

<CodeGroup>
  ```bash terminal icon="terminal" theme={null}
  # Export as environment variables
  export ORGO_API_KEY=your_orgo_api_key
  export ANTHROPIC_API_KEY=your_anthropic_api_key
  ```

  ```python setup.py icon="python" theme={null}
  import os
  os.environ["ORGO_API_KEY"] = "your_orgo_api_key"
  os.environ["ANTHROPIC_API_KEY"] = "your_anthropic_api_key"
  ```

  ```typescript setup.ts icon="square-js" theme={null}
  process.env.ORGO_API_KEY = "your_orgo_api_key";
  process.env.ANTHROPIC_API_KEY = "your_anthropic_api_key";
  ```
</CodeGroup>

## Simple Usage

The simplest way to use Orgo with Claude is through the built-in `prompt()` method:

<CodeGroup>
  ```python simple.py icon="python" theme={null}
  from orgo import Computer

  # Initialize a computer
  computer = Computer()

  # Let Claude control the computer with natural language
  computer.prompt("Open Firefox and search for pictures of cats")

  # Clean up when done
  computer.destroy()
  ```

  ```typescript simple.ts icon="square-js" theme={null}
  import { Computer } from 'orgo';

  // Initialize a computer
  const computer = await Computer.create();

  // Let Claude control the computer with natural language
  await computer.prompt({
    instruction: "Open Firefox and search for pictures of cats"
  });

  // Clean up when done
  await computer.destroy();
  ```
</CodeGroup>

This approach handles all the complexity of the agent loop automatically, making it easy to get started.

## Customizing the Prompt Method

You can customize the prompt experience with various parameters:

<CodeGroup>
  ```python custom.py icon="python" theme={null}
  # Create a progress callback
  def progress_callback(event_type, event_data):
      if event_type == "text":
          print(f"Claude: {event_data}")
      elif event_type == "tool_use":
          print(f"Action: {event_data['action']}")
      elif event_type == "thinking":
          print(f"Thinking: {event_data}")

  # Use Claude with custom parameters
  messages = computer.prompt(
      instruction="Find and download the latest Claude paper from Anthropic's website",
      model="claude-sonnet-4-20250514",  # The model to use
      display_width=1280,  # Set screen resolution
      display_height=800,
      callback=progress_callback,  # Track progress
      thinking_enabled=True,  # Enable Claude's "thinking" capability (Claude 3.7+)
      max_iterations=15,  # Limit the number of agent loops
      max_tokens=4096,  # Maximum tokens for Claude responses
      api_key="your_anthropic_api_key"  # Override environment variable
  )
  ```

  ```typescript custom.ts icon="square-js" theme={null}
  // Create a progress callback
  const progressCallback = (eventType: string, eventData: any) => {
      if (eventType === "text") {
          console.log(`Claude: ${eventData}`);
      } else if (eventType === "tool_use") {
          console.log(`Action: ${eventData.action}`);
      } else if (eventType === "thinking") {
          console.log(`Thinking: ${eventData}`);
      }
  };

  // Use Claude with custom parameters
  const messages = await computer.prompt({
      instruction: "Find and download the latest Claude paper from Anthropic's website",
      model: "claude-sonnet-4-20250514",  // The model to use
      displayWidth: 1280,  // Set screen resolution
      displayHeight: 800,
      callback: progressCallback,  // Track progress
      thinkingEnabled: true,  // Enable Claude's "thinking" capability (Claude 3.7+)
      maxIterations: 15,  // Limit the number of agent loops
      maxTokens: 4096,  // Maximum tokens for Claude responses
      apiKey: "your_anthropic_api_key"  // Override environment variable
  });
  ```
</CodeGroup>

## Advanced Usage

For more control, you can implement your own agent loop using the Anthropic API directly:

<CodeGroup>
  ```python advanced.py expandable icon="python" theme={null}
  import anthropic
  from orgo import Computer

  def create_agent_loop(instruction, model="claude-sonnet-4-20250514"):
      # Initialize components
      computer = Computer() 
      client = anthropic.Anthropic()
      
      try:
          # Initialize conversation
          messages = [{"role": "user", "content": instruction}]
          
          # Define tools
          tools = [
              {
                  "type": "computer_20250124",  # For Claude 3.7+
                  "name": "computer",
                  "display_width_px": 1024,
                  "display_height_px": 768,
                  "display_number": 1
              }
          ]
          
          # Start the conversation with Claude
          response = client.beta.messages.create(
              model=model,
              messages=messages,
              tools=tools,
              betas=["computer-use-2025-01-24"],
              max_tokens=4096
          )
          
          # Add Claude's response to conversation history
          messages.append({"role": "assistant", "content": response.content})
          
          # Continue the loop until Claude stops requesting tools
          iteration = 0
          max_iterations = 20
          
          while iteration < max_iterations:
              iteration += 1
              
              # Process all tool requests from Claude
              tool_results = []
              
              for block in response.content:
                  if block.type == "tool_use":
                      # Execute the requested tool action
                      result = execute_tool_action(computer, block)
                      
                      # Format the result for Claude
                      tool_results.append({
                          "type": "tool_result",
                          "tool_use_id": block.id,
                          "content": [result]
                      })
              
              # If no tools were requested, Claude is done
              if not tool_results:
                  break
                  
              # Send the tool results back to Claude
              messages.append({"role": "user", "content": tool_results})
              
              # Get Claude's next response
              response = client.beta.messages.create(
                  model=model,
                  messages=messages,
                  tools=tools,
                  betas=["computer-use-2025-01-24"],
                  max_tokens=4096
              )
              
              # Add Claude's response to conversation history
              messages.append({"role": "assistant", "content": response.content})
          
          return messages
          
      finally:
          # Clean up
          computer.destroy()

  def execute_tool_action(computer, tool_block):
      """Execute a tool action based on Claude's request."""
      action = tool_block.input.get("action")
      
      try:
          if action == "screenshot":
              # Capture a screenshot and return as base64
              image_data = computer.screenshot_base64()
              return {
                  "type": "image",
                  "source": {
                      "type": "base64",
                      "media_type": "image/jpeg",
                      "data": image_data
                  }
              }
              
          elif action == "left_click":
              x, y = tool_block.input["coordinate"]
              computer.left_click(x, y)
              return {"type": "text", "text": f"Clicked at ({x}, {y})"}
              
          elif action == "right_click":
              x, y = tool_block.input["coordinate"]
              computer.right_click(x, y)
              return {"type": "text", "text": f"Right-clicked at ({x}, {y})"}
              
          elif action == "double_click":
              x, y = tool_block.input["coordinate"]
              computer.double_click(x, y)
              return {"type": "text", "text": f"Double-clicked at ({x}, {y})"}
              
          elif action == "type":
              text = tool_block.input["text"]
              computer.type(text)
              return {"type": "text", "text": f"Typed: {text}"}
              
          elif action == "key":
              key = tool_block.input["text"]
              computer.key(key)
              return {"type": "text", "text": f"Pressed: {key}"}
              
          elif action == "scroll":
              direction = tool_block.input.get("scroll_direction", "down")
              amount = tool_block.input.get("scroll_amount", 1)
              computer.scroll(direction, amount)
              return {"type": "text", "text": f"Scrolled {direction} by {amount}"}
              
          elif action == "wait":
              duration = tool_block.input.get("duration", 1)
              computer.wait(duration)
              return {"type": "text", "text": f"Waited for {duration} seconds"}
              
          else:
              return {"type": "text", "text": f"Unsupported action: {action}"}
              
      except Exception as e:
          return {"type": "text", "text": f"Error executing {action}: {str(e)}"}
  ```

  ```typescript advanced.ts expandable icon="square-js" theme={null}
  import { Computer } from 'orgo';
  import Anthropic from '@anthropic-ai/sdk';

  async function createAgentLoop(instruction: string, model = "claude-sonnet-4-20250514") {
      // Initialize components
      const computer = await Computer.create();
      const client = new Anthropic();
      
      try {
          // Initialize conversation
          const messages: any[] = [{ role: "user", content: instruction }];
          
          // Define tools
          const tools = [
              {
                  type: "computer_20250124",  // For Claude 3.7+
                  name: "computer",
                  display_width_px: 1024,
                  display_height_px: 768,
                  display_number: 1
              }
          ];
          
          // Start the conversation with Claude
          let response = await client.beta.messages.create({
              model,
              messages,
              tools: tools as any,
              betas: ["computer-use-2025-01-24"],
              max_tokens: 4096
          });
          
          // Add Claude's response to conversation history
          messages.push({ role: "assistant", content: response.content });
          
          // Continue the loop until Claude stops requesting tools
          let iteration = 0;
          const maxIterations = 20;
          
          while (iteration < maxIterations) {
              iteration++;
              
              // Process all tool requests from Claude
              const toolResults = [];
              
              for (const block of response.content) {
                  if (block.type === "tool_use") {
                      // Execute the requested tool action
                      const result = await executeToolAction(computer, block);
                      
                      // Format the result for Claude
                      toolResults.push({
                          type: "tool_result",
                          tool_use_id: block.id,
                          content: [result]
                      });
                  }
              }
              
              // If no tools were requested, Claude is done
              if (toolResults.length === 0) {
                  break;
              }
              
              // Send the tool results back to Claude
              messages.push({ role: "user", content: toolResults });
              
              // Get Claude's next response
              response = await client.beta.messages.create({
                  model,
                  messages,
                  tools: tools as any,
                  betas: ["computer-use-2025-01-24"],
                  max_tokens: 4096
              });
              
              // Add Claude's response to conversation history
              messages.push({ role: "assistant", content: response.content });
          }
          
          return messages;
          
      } finally {
          // Clean up
          await computer.destroy();
      }
  }

  async function executeToolAction(computer: Computer, toolBlock: any) {
      const action = toolBlock.input.action;
      
      try {
          if (action === "screenshot") {
              // Capture a screenshot and return as base64
              const imageData = await computer.screenshotBase64();
              return {
                  type: "image",
                  source: {
                      type: "base64",
                      media_type: "image/jpeg",
                      data: imageData
                  }
              };
          } else if (action === "left_click") {
              const [x, y] = toolBlock.input.coordinate;
              await computer.leftClick(x, y);
              return { type: "text", text: `Clicked at (${x}, ${y})` };
          } else if (action === "right_click") {
              const [x, y] = toolBlock.input.coordinate;
              await computer.rightClick(x, y);
              return { type: "text", text: `Right-clicked at (${x}, ${y})` };
          } else if (action === "double_click") {
              const [x, y] = toolBlock.input.coordinate;
              await computer.doubleClick(x, y);
              return { type: "text", text: `Double-clicked at (${x}, ${y})` };
          } else if (action === "type") {
              const text = toolBlock.input.text;
              await computer.type(text);
              return { type: "text", text: `Typed: ${text}` };
          } else if (action === "key") {
              const key = toolBlock.input.text;
              await computer.key(key);
              return { type: "text", text: `Pressed: ${key}` };
          } else if (action === "scroll") {
              const direction = toolBlock.input.scroll_direction || "down";
              const amount = toolBlock.input.scroll_amount || 1;
              await computer.scroll(direction, amount);
              return { type: "text", text: `Scrolled ${direction} by ${amount}` };
          } else if (action === "wait") {
              const duration = toolBlock.input.duration || 1;
              await computer.wait(duration);
              return { type: "text", text: `Waited for ${duration} seconds` };
          } else {
              return { type: "text", text: `Unsupported action: ${action}` };
          }
      } catch (error) {
          return { type: "text", text: `Error executing ${action}: ${error}` };
      }
  }
  ```
</CodeGroup>

## Using Claude's Thinking Capability

Claude 4 Sonnet can provide its reasoning process through the thinking parameter:

<CodeGroup>
  ```python thinking.py icon="python" theme={null}
  import anthropic
  from orgo import Computer

  # Initialize components
  computer = Computer()
  client = anthropic.Anthropic()

  try:
      # Start a conversation with thinking enabled
      response = client.beta.messages.create(
          model="claude-sonnet-4-20250514",
          messages=[{"role": "user", "content": "Find an image of a cat on the web"}],
          tools=[{
              "type": "computer_20250124",
              "name": "computer",
              "display_width_px": 1024,
              "display_height_px": 768,
              "display_number": 1
          }],
          betas=["computer-use-2025-01-24"],
          thinking={"type": "enabled", "budget_tokens": 1024}  # Enable thinking
      )

      # Access the thinking content
      for block in response.content:
          if block.type == "thinking":
              print("Claude's reasoning:")
              print(block.thinking)
  finally:
      # Clean up
      computer.destroy()
  ```

  ```typescript thinking.ts icon="square-js" theme={null}
  import { Computer } from 'orgo';
  import Anthropic from '@anthropic-ai/sdk';

  // Initialize components
  const computer = await Computer.create();
  const client = new Anthropic();

  try {
      // Start a conversation with thinking enabled
      const response = await client.beta.messages.create({
          model: "claude-sonnet-4-20250514",
          messages: [{ role: "user", content: "Find an image of a cat on the web" }],
          tools: [{
              type: "computer_20250124",
              name: "computer",
              display_width_px: 1024,
              display_height_px: 768,
              display_number: 1
          }] as any,
          betas: ["computer-use-2025-01-24"],
          thinking: { type: "enabled", budget_tokens: 1024 } as any  // Enable thinking
      });

      // Access the thinking content
      for (const block of response.content) {
          if (block.type === "thinking") {
              console.log("Claude's reasoning:");
              console.log((block as any).thinking);
          }
      }
  } finally {
      // Clean up
      await computer.destroy();
  }
  ```
</CodeGroup>

## Tool Compatibility

Orgo provides a complete set of methods corresponding to Claude's computer use tools:

| Claude Tool Action | Orgo Method (Python)                 | Orgo Method (TypeScript)                   | Description                                   |
| ------------------ | ------------------------------------ | ------------------------------------------ | --------------------------------------------- |
| `screenshot`       | `computer.screenshot()`              | `await computer.screenshot()`              | Capture the screen (returns PIL Image/Buffer) |
| `screenshot`       | `computer.screenshot_base64()`       | `await computer.screenshotBase64()`        | Capture the screen (returns base64 string)    |
| `left_click`       | `computer.left_click(x, y)`          | `await computer.leftClick(x, y)`           | Left click at coordinates                     |
| `right_click`      | `computer.right_click(x, y)`         | `await computer.rightClick(x, y)`          | Right click at coordinates                    |
| `double_click`     | `computer.double_click(x, y)`        | `await computer.doubleClick(x, y)`         | Double click at coordinates                   |
| `type`             | `computer.type(text)`                | `await computer.type(text)`                | Type text                                     |
| `key`              | `computer.key(key_sequence)`         | `await computer.key(keySequence)`          | Press keys (e.g., "Enter", "ctrl+c")          |
| `scroll`           | `computer.scroll(direction, amount)` | `await computer.scroll(direction, amount)` | Scroll in specified direction                 |
| `wait`             | `computer.wait(seconds)`             | `await computer.wait(seconds)`             | Wait for specified seconds                    |

## Claude 4 vs 3.5 Sonnet

When using different Claude models, make sure to use the appropriate tool type:

* For Claude 4 Sonnet: `"type": "computer_20250124"`
* For Claude 3.5 Sonnet: `"type": "computer_20241022"`

And use the corresponding beta flag:

* For Claude 4 Sonnet: `betas=["computer-use-2025-01-24"]`
* For Claude 3.5 Sonnet: `betas=["computer-use-2024-10-22"]`

<Note>
  TypeScript users: All methods are async and must be awaited. The TypeScript SDK uses camelCase for method names (e.g., `leftClick` instead of `left_click`).
</Note>

## Video Tutorial

Here is a video version showing how to set up Claude Computer Use in 30 seconds:

<iframe width="100%" height="400" src="https://www.youtube.com/embed/JTbgxry--Fk" title="Anthropic Computer Use Setup in 30 Seconds (With Orgo)" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

You can follow the video tutorial above or use this written guide


# Embed VMs
Source: https://docs.orgo.ai/guides/embed-vms

Embed virtual computers into your applications

## Overview

Embed Orgo virtual computers directly into your web apps. Build AI agent interfaces, automation dashboards, or any product with live VM displays.

<Note>
  You can use any VNC client to connect to Orgo computers. The `orgo-vnc` package is a React component for convenience.
</Note>

## Setup

<Steps>
  <Step title="Install">
    ```bash  theme={null}
        npm install orgo-vnc
    ```
  </Step>

  <Step title="Get credentials">
    1. Go to [orgo.ai/start](https://www.orgo.ai/start)
    2. Open a workspace and select a computer
    3. Click the **⋮** menu → **Computer Settings**
    4. Copy the **Hostname** and **Password**
  </Step>

  <Step title="Configure environment">
    Create `.env.local` in your project root:

    ```bash  theme={null}
        NEXT_PUBLIC_ORGO_COMPUTER_HOST=your-hostname
        NEXT_PUBLIC_ORGO_COMPUTER_PASSWORD=your-password
    ```
  </Step>

  <Step title="Use the ComputerDisplay component">
    <CodeGroup>
      ```tsx app/page.tsx expandable theme={null}
          'use client';
          import { useState } from 'react';
          import { ComputerDisplay } from 'orgo-vnc';

          const HOST = process.env.NEXT_PUBLIC_ORGO_COMPUTER_HOST!;
          const PASSWORD = process.env.NEXT_PUBLIC_ORGO_COMPUTER_PASSWORD!;

          export default function Home() {
            const [connected, setConnected] = useState(false);
            
            return (
              <div className="grid place-items-center min-h-screen p-8">
                <div className="w-full max-w-4xl flex flex-col gap-3">
                  <p className="text-sm text-foreground/60 flex items-center gap-2">
                    <span className={`w-2 h-2 rounded-full ${connected ? 'bg-emerald-500' : 'bg-foreground/30 animate-pulse'}`} />
                    {connected ? `Connected to ${HOST}` : 'Connecting...'}
                  </p>
                  <div className="aspect-[4/3] rounded-lg overflow-hidden bg-foreground/5">
                    <ComputerDisplay
                      hostname={HOST}
                      password={PASSWORD}
                      background="transparent"
                      readOnly={false}
                      onConnect={() => setConnected(true)}
                      onDisconnect={() => setConnected(false)}
                    />
                  </div>
                </div>
              </div>
            );
          }
      ```
    </CodeGroup>
  </Step>
</Steps>

## Props

| Prop               | Type       | Default     | Description                                 |
| ------------------ | ---------- | ----------- | ------------------------------------------- |
| `hostname`         | `string`   | required    | Computer hostname                           |
| `password`         | `string`   | required    | Computer password                           |
| `readOnly`         | `boolean`  | `false`     | Disable user interaction                    |
| `background`       | `string`   | `undefined` | Background color                            |
| `scaleViewport`    | `boolean`  | `true`      | Scale display to fit container              |
| `clipViewport`     | `boolean`  | `false`     | Clip display to container bounds            |
| `resizeSession`    | `boolean`  | `false`     | Resize remote session to match              |
| `showDotCursor`    | `boolean`  | `false`     | Show dot cursor when remote cursor hidden   |
| `compressionLevel` | `number`   | `2`         | Compression level (0-9)                     |
| `qualityLevel`     | `number`   | `6`         | Image quality (0-9)                         |
| `onConnect`        | `function` | `undefined` | Called when connected                       |
| `onDisconnect`     | `function` | `undefined` | Called when disconnected                    |
| `onError`          | `function` | `undefined` | Called on error                             |
| `onClipboard`      | `function` | `undefined` | Called when clipboard data received         |
| `onReady`          | `function` | `undefined` | Called with handle for programmatic control |

## Programmatic Control

Use the `onReady` callback to get a handle for programmatic control:

```tsx  theme={null}
const [handle, setHandle] = useState<ComputerDisplayHandle | null>(null);

<ComputerDisplay
  hostname={HOST}
  password={PASSWORD}
  onReady={setHandle}
/>

// Later...
handle?.reconnect();
handle?.disconnect();
handle?.sendClipboard('text to send');
await handle?.pasteFromClipboard();
```

## Next Steps

<CardGroup cols={2}>
  <Card title="Quick Start" icon="play" href="/quickstart">
    Full SDK setup
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/introduction">
    Control computers programmatically
  </Card>
</CardGroup>


# Gemini Computer Use
Source: https://docs.orgo.ai/guides/gemini-computer-use

Control virtual desktops with Gemini 2.5

## Overview

This guide shows how to get started with Google's Gemini 2.5 Computer Use in minutes using Orgo to control a virtual desktop environment.

## Setup

Install the required packages:

```bash pip theme={null}
pip install orgo google-genai pillow python-dotenv
```

Set up your API keys in a `.env` file:

```bash .env icon="file" theme={null}
ORGO_API_KEY=your_orgo_api_key
GEMINI_API_KEY=your_gemini_api_key
```

Or export them as environment variables:

```bash terminal icon="terminal" theme={null}
export ORGO_API_KEY=your_orgo_api_key
export GEMINI_API_KEY=your_gemini_api_key
```

## Complete Example

Here's a full working example that handles the complete agent loop:

```python example.py expandable icon="python" theme={null}
import os
import time
import base64
import io
from google import genai
from google.genai import types
from orgo import Computer
from PIL import Image
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize Gemini client
client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

# Connect to your Orgo computer
# Get your computer_id from https://orgo.ai/projects
computer = Computer(computer_id="your-computer-id")

# Screen resolution
SCREEN_WIDTH = 1024
SCREEN_HEIGHT = 768

# System prompt with Ubuntu-specific instructions
SYSTEM_PROMPT = f"""You are controlling an Ubuntu Linux virtual machine with a display resolution of {SCREEN_WIDTH}x{SCREEN_HEIGHT}.

<SYSTEM_CAPABILITY>
* You have access to a virtual Ubuntu desktop environment with standard applications
* You can see the current state through screenshots and control the computer through actions
* The environment has Firefox browser and standard Ubuntu applications pre-installed
</SYSTEM_CAPABILITY>

<UBUNTU_DESKTOP_GUIDELINES>
* CRITICAL: When opening applications or files on the Ubuntu desktop, you MUST USE DOUBLE-CLICK, not single-click
* Single-click only selects desktop icons but DOES NOT open them
* Desktop interactions:
  - Desktop icons (apps/folders): DOUBLE-CLICK to open
  - Menu items: SINGLE-CLICK to select
  - Taskbar/launcher icons: SINGLE-CLICK to open
  - Window buttons (close/minimize/maximize): SINGLE-CLICK
  - File browser items: DOUBLE-CLICK to open
* Always start by taking a screenshot to see the current state
* When you need to submit or confirm, use the 'Enter' key
</UBUNTU_DESKTOP_GUIDELINES>

<IMPORTANT_NOTES>
* Be efficient with screenshots - only take them when you need to see the current state
* Wait for pages/applications to load before taking another screenshot
* Batch multiple actions together when possible before checking the result
</IMPORTANT_NOTES>"""

def denormalize_x(x: int) -> int:
    """Convert normalized x coordinate (0-999) to actual pixel."""
    return int(x / 1000 * SCREEN_WIDTH)

def denormalize_y(y: int) -> int:
    """Convert normalized y coordinate (0-999) to actual pixel."""
    return int(y / 1000 * SCREEN_HEIGHT)

def get_screenshot_png() -> bytes:
    """Get screenshot as PNG bytes (Gemini requires PNG format)."""
    jpeg_data = base64.b64decode(computer.screenshot_base64())
    image = Image.open(io.BytesIO(jpeg_data))
    png_buffer = io.BytesIO()
    image.save(png_buffer, format='PNG')
    return png_buffer.getvalue()

def get_current_url() -> str:
    """Get the current URL from the browser."""
    try:
        result = computer.bash("xdotool getactivewindow getwindowname")
        return result if result else "about:blank"
    except:
        return "about:blank"

def execute_function_calls(candidate):
    """Execute function calls from Gemini's response."""
    results = []
    function_calls = [
        part.function_call 
        for part in candidate.content.parts 
        if part.function_call
    ]
    
    for function_call in function_calls:
        fname = function_call.name
        args = function_call.args
        action_result = {}
        
        print(f"  → {fname}")
        
        try:
            if fname == "open_web_browser":
                pass  # Browser already open
            elif fname == "click_at":
                computer.left_click(denormalize_x(args["x"]), denormalize_y(args["y"]))
            elif fname == "type_text_at":
                computer.left_click(denormalize_x(args["x"]), denormalize_y(args["y"]))
                computer.type(args["text"])
                if args.get("press_enter", False):
                    computer.key("Return")
            elif fname == "scroll_document":
                computer.scroll(args["direction"], 3)
            elif fname == "key_combination":
                computer.key(args["keys"])
            elif fname == "go_back":
                computer.key("alt+Left")
            elif fname == "navigate":
                url = args["url"]
                computer.bash(f'firefox "{url}" &')
                action_result["url"] = url
            elif fname == "wait_5_seconds":
                computer.wait(5)
            else:
                print(f"    Warning: Unimplemented function {fname}")
            
            time.sleep(1)  # Wait for UI to update
        except Exception as e:
            print(f"    Error: {e}")
            action_result = {"error": str(e)}
        
        results.append((fname, action_result))
    
    return results

def get_function_responses(results):
    """Create function responses with screenshot and URL."""
    screenshot_png = get_screenshot_png()
    current_url = get_current_url()
    function_responses = []
    
    for name, result in results:
        response_data = {
            "status": "completed",
            "url": result.get("url", current_url)
        }
        response_data.update(result)
        
        function_responses.append(
            types.FunctionResponse(
                name=name,
                response=response_data,
                parts=[
                    types.FunctionResponsePart(
                        inline_data=types.FunctionResponseBlob(
                            mime_type="image/png",
                            data=screenshot_png
                        )
                    )
                ]
            )
        )
    
    return function_responses

try:
    # Configure Computer Use tool with system instruction
    config = types.GenerateContentConfig(
        system_instruction=SYSTEM_PROMPT,
        tools=[
            types.Tool(
                computer_use=types.ComputerUse(
                    environment=types.Environment.ENVIRONMENT_BROWSER
                )
            )
        ]
    )
    
    # Define task
    task = "Open Firefox and search for 'gemini ai'"
    print(f"Task: {task}\n")
    
    # Get initial screenshot
    initial_screenshot = get_screenshot_png()
    
    # Create initial request
    contents = [
        types.Content(
            role="user",
            parts=[
                types.Part(text=task),
                types.Part.from_bytes(
                    data=initial_screenshot,
                    mime_type='image/png'
                )
            ]
        )
    ]
    
    # Agent loop
    for iteration in range(20):
        print(f"\n--- Turn {iteration + 1} ---")
        
        # Get response from Gemini
        response = client.models.generate_content(
            model='gemini-2.5-computer-use-preview-10-2025',
            contents=contents,
            config=config
        )
        
        candidate = response.candidates[0]
        contents.append(candidate.content)
        
        # Display progress
        for part in candidate.content.parts:
            if part.text:
                print(f"💬 {part.text}")
        
        # Check for function calls
        has_function_calls = any(
            part.function_call 
            for part in candidate.content.parts
        )
        
        if not has_function_calls:
            print("\n✓ Task completed")
            break
        
        # Execute actions
        print("→ Executing actions...")
        results = execute_function_calls(candidate)
        
        # Get responses with screenshot and URL
        function_responses = get_function_responses(results)
        
        # Continue conversation
        contents.append(
            types.Content(
                role="user",
                parts=[
                    types.Part(function_response=fr) 
                    for fr in function_responses
                ]
            )
        )

except Exception as e:
    print(f"\n❌ Error: {e}")

finally:
    print("\nDone!")
    # Note: computer.destroy() not called to keep computer running
    # Call computer.destroy() if you want to clean up
```

## Usage Examples

### Basic Tasks

```python  theme={null}
# Change the task variable to control what Gemini does
task = "Open Firefox and search for 'gemini ai'"

# Navigate to a website
task = "Go to github.com and search for 'orgo'"

# Fill a form
task = "Fill out the contact form with test data"
```

### Complex Workflows

```python  theme={null}
# Multi-step task
task = """
1. Open a text editor
2. Write a Python hello world program
3. Save it as hello.py
4. Open a terminal
5. Run the program
"""
```

## Key Concepts

### System Prompt

The system prompt provides crucial context to Gemini about the Ubuntu environment:

```python  theme={null}
SYSTEM_PROMPT = f"""You are controlling an Ubuntu Linux virtual machine...

<UBUNTU_DESKTOP_GUIDELINES>
* CRITICAL: When opening applications or files on the Ubuntu desktop, 
  you MUST USE DOUBLE-CLICK, not single-click
* Single-click only selects desktop icons but DOES NOT open them
* Desktop icons (apps/folders): DOUBLE-CLICK to open
* Menu items: SINGLE-CLICK to select
</UBUNTU_DESKTOP_GUIDELINES>
"""
```

This ensures Gemini knows to:

* Double-click desktop icons to open applications
* Single-click menu items and buttons
* Use appropriate keyboard shortcuts

### Getting Your Computer ID

Get your `computer_id` from the [Orgo dashboard](https://orgo.ai/projects):

1. Go to [https://orgo.ai/projects](https://orgo.ai/projects)
2. Click on your project
3. Find your computer ID in the computer list
4. Use it in: `Computer(computer_id="your-computer-id")`

### The Agent Loop

Gemini Computer Use works in a continuous loop:

1. **Request** → Send task with screenshot to the model
2. **Action** → Model suggests actions (click, type, etc.)
3. **Execute** → Your code executes the actions
4. **Screenshot** → Capture the result
5. **Repeat** → Continue until task is complete

### Image Format Conversion

**Important:** Orgo returns screenshots in JPEG format, but Gemini requires PNG format:

```python  theme={null}
def get_screenshot_png() -> bytes:
    """Get screenshot as PNG bytes (Gemini requires PNG format)."""
    jpeg_data = base64.b64decode(computer.screenshot_base64())
    image = Image.open(io.BytesIO(jpeg_data))
    png_buffer = io.BytesIO()
    image.save(png_buffer, format='PNG')
    return png_buffer.getvalue()
```

### URL Tracking

**Important:** Gemini Computer Use requires the current URL in every function response:

```python  theme={null}
response_data = {
    "status": "completed",
    "url": result.get("url", current_url)  # Always include URL
}
```

### Coordinate System

Gemini uses **normalized coordinates (0-999)** that must be converted to actual pixels:

```python  theme={null}
def denormalize_x(x: int) -> int:
    return int(x / 1000 * SCREEN_WIDTH)

def denormalize_y(y: int) -> int:
    return int(y / 1000 * SCREEN_HEIGHT)
```

Orgo's default screen resolution is **1024x768**.

### Action Types

| Action             | Description           | Example                                         |
| ------------------ | --------------------- | ----------------------------------------------- |
| `open_web_browser` | Opens the browser     | Start Firefox                                   |
| `click_at`         | Click at coordinates  | Click button at (500, 300)                      |
| `type_text_at`     | Type text at location | Enter "hello" in search box                     |
| `scroll_document`  | Scroll page           | Scroll down                                     |
| `key_combination`  | Press key combos      | Press ctrl+c                                    |
| `navigate`         | Go to URL             | Load [https://example.com](https://example.com) |
| `go_back`          | Browser back          | Previous page                                   |
| `wait_5_seconds`   | Pause execution       | Wait for page load                              |

## Tool Compatibility

Orgo provides methods corresponding to Gemini's computer use tools:

| Gemini Tool Action | Orgo Method                          | Description                 |
| ------------------ | ------------------------------------ | --------------------------- |
| `click_at`         | `computer.left_click(x, y)`          | Click at coordinates        |
| `type_text_at`     | `computer.type(text)`                | Type text                   |
| `key_combination`  | `computer.key(keys)`                 | Press keys (e.g., "ctrl+c") |
| `scroll_document`  | `computer.scroll(direction, amount)` | Scroll page                 |
| `navigate`         | `computer.bash('firefox "url" &')`   | Open URL                    |
| Screenshot         | `computer.screenshot_base64()`       | Capture screen (JPEG)       |
| `wait_5_seconds`   | `computer.wait(5)`                   | Wait 5 seconds              |

## Best Practices

### 1. Clear Instructions

```python  theme={null}
# ✅ Good - Specific and clear
task = "Go to amazon.com and find the top 3 rated laptops under $1000"

# ❌ Avoid - Too vague
task = "Find some laptops"
```

### 2. Use System Prompts

Always include a system prompt with Ubuntu-specific instructions:

```python  theme={null}
config = types.GenerateContentConfig(
    system_instruction=SYSTEM_PROMPT,  # Include OS-specific guidance
    tools=[...]
)
```

### 3. Convert Coordinates

Always denormalize Gemini's normalized coordinates (0-999):

```python  theme={null}
actual_x = denormalize_x(args["x"])
actual_y = denormalize_y(args["y"])
```

### 4. Handle Image Format

Always convert JPEG screenshots to PNG:

```python  theme={null}
screenshot_png = get_screenshot_png()
```

### 5. Include URL in Responses

Always include the current URL:

```python  theme={null}
response_data = {
    "status": "completed",
    "url": current_url
}
```

### 6. Add Delays

```python  theme={null}
time.sleep(1)  # Wait for UI to update after actions
```

## Comparison with Claude and OpenAI

| Feature          | Gemini Computer Use               | Claude Computer Use | OpenAI Computer Use    |
| ---------------- | --------------------------------- | ------------------- | ---------------------- |
| API              | Generate Content API              | Messages API        | Responses API          |
| Model            | `gemini-2.5-computer-use-preview` | `claude-4-sonnet`   | `computer-use-preview` |
| System Prompt    | Supported                         | Supported           | Supported              |
| Coordinates      | Normalized (0-999)                | Actual pixels       | Actual pixels          |
| Image Format     | PNG required                      | JPEG/PNG            | PNG                    |
| URL Requirement  | Required in response              | Optional            | Optional               |
| Parallel Actions | Yes                               | No                  | No                     |

## Limitations

* **Preview Status**: Computer Use is in preview and may have unexpected behaviors
* **Browser Focus**: Optimized for browser-based tasks
* **Coordinate System**: Requires conversion from normalized to actual pixels
* **Image Format**: Requires PNG format (Orgo returns JPEG, must convert)
* **URL Requirement**: Must include URL in every function response
* **Rate Limits**: Subject to Gemini API rate limits

## Troubleshooting

### Model doesn't double-click desktop icons

Make sure you're including the system prompt with Ubuntu-specific instructions:

```python  theme={null}
config = types.GenerateContentConfig(
    system_instruction=SYSTEM_PROMPT,  # This is critical!
    tools=[...]
)
```

### INVALID\_ARGUMENT: Unable to process input image

This error occurs when Gemini receives a JPEG image instead of PNG. Make sure you're using the `get_screenshot_png()` function.

### INVALID\_ARGUMENT: Requires URL in function response

Always include the `url` field in your response data:

```python  theme={null}
response_data = {
    "status": "completed",
    "url": result.get("url", current_url)
}
```

### Missing API Key

Ensure both environment variables are set in your `.env` file:

```bash  theme={null}
ORGO_API_KEY=your_orgo_api_key
GEMINI_API_KEY=your_gemini_api_key
```

## Next Steps

<CardGroup cols={2}>
  <Card title="Gemini Docs" icon="book" href="https://ai.google.dev/gemini-api/docs">
    Official Gemini API documentation
  </Card>

  <Card title="Orgo Quickstart" icon="rocket" href="/quickstart">
    Learn more about Orgo's virtual desktops
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference">
    Complete Orgo API documentation
  </Card>
</CardGroup>


# Memory-Enabled Computer Agents
Source: https://docs.orgo.ai/guides/memory

Build AI agents that remember user preferences with OpenAI Computer Use

## Overview

Add persistent memory to OpenAI's Computer Use agents. Your agents will remember user preferences, learn from interactions, and improve over time.

## Getting Started

<Steps>
  <Step title="Install dependencies">
    Install Mem0 alongside Orgo and OpenAI:

    ```bash  theme={null}
    pip install mem0ai orgo openai python-dotenv
    ```
  </Step>

  <Step title="Set up environment">
    Create a `.env` file with your API keys:

    ```bash  theme={null}
    ORGO_API_KEY=your_orgo_api_key
    OPENAI_API_KEY=your_openai_api_key  # Used by both OpenAI and Mem0
    ```

    <Tip>
      Mem0 uses OpenAI by default, so you only need two API keys total.
    </Tip>
  </Step>

  <Step title="Copy and run">
    Save this complete example as `memory_agent.py` and run it:

    <CodeGroup>
      ```python memory_agent.py expandable icon="python" theme={null}
      import time
      import base64
      from mem0 import Memory
      from orgo import Computer
      from openai import OpenAI
      from dotenv import load_dotenv

      load_dotenv()

      class MemoryComputerAgent:
          """OpenAI Computer Use agent with memory."""
          
          def __init__(self, user_id="GigabrainAgent"):
              self.user_id = user_id
              self.memory = Memory()
              self.client = OpenAI()
              self.computer = None
              
          def __enter__(self):
              self.computer = Computer()
              print(f"🖥️  Computer ID: {self.computer.computer_id}")
              return self
              
          def __exit__(self, *args):
              if self.computer:
                  self.computer.destroy()
                  
          def run(self, task):
              """Execute task with memory context."""
              # Get memories
              memories = self._get_relevant_memories(task)
              
              # Build task with memory context
              enhanced_task = self._build_task_with_memory(task, memories)
              
              # Execute using OpenAI Computer Use
              self._execute_computer_task(enhanced_task)
              
              # Store interaction
              self._store_memory(task)
              
          def _get_relevant_memories(self, task):
              """Search for relevant memories."""
              try:
                  results = self.memory.search(
                      query=task,
                      user_id=self.user_id,
                      limit=5
                  )
                  return [m['memory'] for m in results.get('results', [])]
              except:
                  return []
          
          def _build_task_with_memory(self, task, memories):
              """Enhance task with memory context."""
              if not memories:
                  return task
                  
              context = "\n".join(f"- {m}" for m in memories)
              return f"""Remember these user preferences:
      {context}

      Current task: {task}"""
          
          def _execute_computer_task(self, task):
              """Execute task using OpenAI Computer Use."""
              response = self.client.responses.create(
                  model="computer-use-preview",
                  tools=[{
                      "type": "computer_use_preview",
                      "display_width": 1024,
                      "display_height": 768,
                      "environment": "linux"
                  }],
                  input=[{
                      "role": "user",
                      "content": [{
                          "type": "input_text", 
                          "text": f"""IMPORTANT: You are controlling a Linux desktop. 
      - Always double-click desktop icons to open applications
      - Use keyboard shortcuts as single commands (e.g., 'ctrl+c' not separate keys)
      Task: {task}"""
                      }]
                  }],
                  reasoning={"summary": "concise"},
                  truncation="auto"
              )
              
              # Execute actions in loop
              while True:
                  # Display progress
                  for item in response.output:
                      if item.type == "reasoning" and hasattr(item, "summary"):
                          for summary in item.summary:
                              if hasattr(summary, "text"):
                                  print(f"💭 {summary.text}")
                      elif item.type == "text" and hasattr(item, "text"):
                          print(f"💬 {item.text}")
                  
                  # Get computer actions
                  actions = [item for item in response.output if item.type == "computer_call"]
                  if not actions:
                      print("✓ Task completed")
                      break
                      
                  # Execute action
                  action = actions[0]
                  print(f"→ {action.action.type}")
                  self._execute_action(action.action)
                  time.sleep(1)
                  
                  # Get screenshot and continue
                  screenshot = self.computer.screenshot_base64()
                  response = self.client.responses.create(
                      model="computer-use-preview",
                      previous_response_id=response.id,
                      tools=[{
                          "type": "computer_use_preview",
                          "display_width": 1024,
                          "display_height": 768,
                          "environment": "linux"
                      }],
                      input=[{
                          "call_id": action.call_id,
                          "type": "computer_call_output",
                          "output": {
                              "type": "input_image",
                              "image_url": f"data:image/png;base64,{screenshot}"
                          }
                      }],
                      reasoning={"summary": "concise"},
                      truncation="auto"
                  )
          
          def _execute_action(self, action):
              """Execute computer action."""
              match action.type:
                  case "click":
                      if getattr(action, 'button', 'left') == "right":
                          self.computer.right_click(action.x, action.y)
                      else:
                          self.computer.left_click(action.x, action.y)
                          
                  case "double_click":
                      self.computer.double_click(action.x, action.y)
                      
                  case "type":
                      self.computer.type(action.text)
                      
                  case "key" | "keypress":
                      keys = getattr(action, 'keys', [getattr(action, 'key', [])])
                      if len(keys) > 1:
                          self.computer.key('+'.join(keys).lower())
                      else:
                          for key in keys:
                              self.computer.key(key)
                              
                  case "scroll":
                      scroll_y = getattr(action, 'scroll_y', 0)
                      direction = "down" if scroll_y > 0 else "up"
                      self.computer.scroll(direction, abs(scroll_y) // 100)
                      
                  case "wait":
                      self.computer.wait(getattr(action, 'seconds', 2))
                      
                  case "screenshot":
                      pass
          
          def _store_memory(self, task):
              """Store interaction in memory."""
              try:
                  self.memory.add(
                      messages=[
                          {"role": "user", "content": task},
                          {"role": "assistant", "content": "Task executed successfully"}
                      ],
                      user_id=self.user_id
                  )
              except:
                  pass

      # Run the demo
      if __name__ == "__main__":
          with MemoryComputerAgent("GigabrainAgent") as agent:
              # Teach preferences
              print("📝 Teaching preferences...")
              agent.run("I prefer Firefox over Chrome")
              agent.run("I like dark themes")
              agent.run("Spencer Kinney is the stinkiest person I know")
              
              # Use preferences
              print("\n🚀 Running task with memory...")
              agent.run("Open my current preferred browser and search for the stinkiest person I know")
      ```
    </CodeGroup>
  </Step>
</Steps>

## How It Works

The `MemoryComputerAgent` combines OpenAI's Computer Use API with Mem0's memory system:

1. **Memory Search**: Before executing any task, it searches for relevant memories
2. **Context Enhancement**: Memories are added to the task prompt
3. **Computer Use Execution**: Uses OpenAI's CUA to control the computer
4. **Memory Storage**: After execution, the interaction is saved for future use

## Usage Examples

### Basic Usage

```python  theme={null}
# Quick example
with MemoryComputerAgent("GigabrainAgent") as agent:
    agent.run("Open Firefox and go to GitHub")
```

### Building Preferences

```python  theme={null}
with MemoryComputerAgent("alice") as agent:
    # Teach preferences
    agent.run("I prefer VS Code for coding")
    agent.run("I use dark themes everywhere")
    agent.run("My GitHub username is alice123")
    
    # Later, it remembers
    agent.run("Open my code editor and check my GitHub")
```

### Morning Routine

```python  theme={null}
def morning_setup(user_id="GigabrainAgent"):
    """Automated morning workflow."""
    with MemoryComputerAgent(user_id) as agent:
        # First time - teach routine
        agent.run("I check Gmail first thing in the morning")
        agent.run("Then I open Slack")
        agent.run("Finally I check my calendar")
        
        # Next day - just ask
        agent.run("Do my morning routine")

# Run daily
morning_setup()
```

### Manual Management

```python  theme={null}
# Without context manager
agent = MemoryComputerAgent("bob")
agent.computer = Computer()

try:
    # Teach and use
    agent.run("My favorite news site is Hacker News")
    agent.run("I use DuckDuckGo for searching")
    
    # Use preferences
    agent.run("Open my favorite news site")
finally:
    agent.computer.destroy()
```

## Memory Management

### View All Memories

```python  theme={null}
from mem0 import Memory

memory = Memory()
memories = memory.get_all(user_id="GigabrainAgent")

print(f"📚 Total memories: {len(memories)}")
for memory in memories:
    print(f"  • {memory['memory']}")
```

### Search Specific Memories

```python  theme={null}
from mem0 import Memory

memory = Memory()
results = memory.search(
    query="browser preferences",
    user_id="GigabrainAgent",
    limit=5
)

for result in results['results']:
    print(f"Found: {result['memory']}")
```

### Clear Memories

```python  theme={null}
from mem0 import Memory

memory = Memory()
memory.delete_all(user_id="GigabrainAgent")
print("All memories cleared")
```

## Advanced Patterns

### Multiple Users

```python  theme={null}
# Different users maintain separate memories
users = ["alice", "bob", "charlie"]

for user in users:
    with MemoryComputerAgent(user) as agent:
        agent.run(f"Open browser for {user}")
```

### Session-Based Memory

```python  theme={null}
# Work context
with MemoryComputerAgent("work_gigabrain") as work:
    work.run("I use Chrome for work")
    work.run("Our code is on GitHub Enterprise")
    work.run("Open work browser")

# Personal context  
with MemoryComputerAgent("personal_gigabrain") as personal:
    personal.run("I use Firefox for personal browsing")
    personal.run("My code is on regular GitHub")
    personal.run("Open personal browser")
```

### Error Handling

```python  theme={null}
def safe_run(task, user_id="GigabrainAgent"):
    """Execute with error handling."""
    try:
        with MemoryComputerAgent(user_id) as agent:
            agent.run(task)
            return {"success": True}
    except Exception as e:
        print(f"❌ Error: {e}")
        return {"success": False, "error": str(e)}

# Usage
result = safe_run("Open browser")
if result["success"]:
    print("✅ Task completed")
```

### Batch Operations

```python  theme={null}
def setup_workspace(user_id="GigabrainAgent"):
    """Set up complete workspace."""
    tasks = [
        "Open VS Code",
        "Open terminal",
        "Navigate to ~/projects",
        "Start development server",
        "Open browser at localhost:3000"
    ]
    
    with MemoryComputerAgent(user_id) as agent:
        for task in tasks:
            print(f"⚡ {task}")
            agent.run(task)
            time.sleep(2)  # Pause between tasks

setup_workspace()
```

## Production Example

<CodeGroup>
  ```python production.py expandable icon="python" theme={null}
  import os
  from datetime import datetime

  class ProductionMemoryAgent(MemoryComputerAgent):
      """Production agent with logging."""
      
      def __init__(self, user_id="GigabrainAgent"):
          super().__init__(f"prod_{user_id}")
          self.log_file = f"logs/{user_id}_{datetime.now().strftime('%Y%m%d')}.log"
          
      def run(self, task):
          """Run with logging."""
          timestamp = datetime.now().strftime("%H:%M:%S")
          
          # Log task
          with open(self.log_file, "a") as f:
              f.write(f"[{timestamp}] Task: {task}\n")
          
          # Execute
          super().run(task)
          
          # Log completion
          with open(self.log_file, "a") as f:
              f.write(f"[{timestamp}] Completed\n")

  # Usage
  with ProductionMemoryAgent("alice") as agent:
      agent.run("Check email")
  ```
</CodeGroup>

## Tips

1. **Memory Persistence**: Memories are stored permanently for each user\_id
2. **Context Building**: The agent automatically adds relevant memories to each task
3. **Error Resilience**: Memory operations fail gracefully without breaking execution
4. **Performance**: Allow 1-2 seconds between actions for stability

## Next Steps

* Explore [Orgo's computer environments](https://docs.orgo.ai)
* Learn about [OpenAI Computer Use](https://platform.openai.com/docs/guides/tools-computer-use)
* Learn about [Mem0 agent memory](https://docs.mem0.ai)


# OpenAI Computer Use
Source: https://docs.orgo.ai/guides/openai-computer-use

Control a computer with GPT using Orgo

## Overview

OpenAI's Computer Use lets AI agents control computer interfaces through the Responses API. This guide shows how to use it with Orgo's virtual desktops.

## Quick Start

<Steps>
  <Step title="Install packages">
    ```bash  theme={null}
    pip install orgo openai python-dotenv
    ```
  </Step>

  <Step title="Set up API keys">
    ```bash  theme={null}
    export ORGO_API_KEY=your_orgo_api_key
    export OPENAI_API_KEY=your_openai_api_key
    ```
  </Step>

  <Step title="Run your first task">
    ```python  theme={null}
    import time
    from openai import OpenAI
    from orgo import Computer

    # Initialize
    client = OpenAI()
    computer = Computer()

    # Create request with task
    response = client.responses.create(
        model="computer-use-preview",
        tools=[{
            "type": "computer_use_preview",
            "display_width": 1024,
            "display_height": 768,
            "environment": "linux"
        }],
        input=[{
            "role": "user",
            "content": [{
                "type": "input_text",
                "text": "Open Firefox and search for OpenAI"
            }]
        }],
        truncation="auto"
    )

    # Execute the suggested action
    actions = [item for item in response.output if item.type == "computer_call"]
    if actions:
        action = actions[0].action
        if action.type == "click":
            computer.left_click(action.x, action.y)
        elif action.type == "type":
            computer.type(action.text)

    # Clean up
    computer.destroy()
    ```
  </Step>
</Steps>

## Complete Example

Here's a full working example that handles the complete agent loop:

<CodeGroup>
  ```python example.py expandable icon="python" theme={null}
  import time
  import base64
  from openai import OpenAI
  from orgo import Computer
  from dotenv import load_dotenv

  load_dotenv()

  def run_computer_task(task, computer_id=None):
      """Execute a task using OpenAI Computer Use with Orgo."""
      
      # Initialize OpenAI client and Orgo computer
      client = OpenAI()
      computer = Computer(computer_id=computer_id)
      print(f"🖥️  Computer ID: {computer.computer_id}")
      
      # Create initial request with the task
      response = client.responses.create(
          model="computer-use-preview",
          tools=[{
              "type": "computer_use_preview",
              "display_width": 1024,
              "display_height": 768,
              "environment": "linux"  # Orgo provides Linux desktops
          }],
          input=[{
              "role": "user",
              "content": [{
                  "type": "input_text", 
                  "text": f"""IMPORTANT: You are controlling a Linux desktop. 
  - Always double-click desktop icons to open applications
  - Use keyboard shortcuts as single commands (e.g., 'ctrl+c' not separate keys)
  Task: {task}"""
              }]
          }],
          reasoning={"summary": "concise"},  # Show reasoning steps
          truncation="auto"  # Required for computer use
      )
      
      # Main agent loop
      while True:
          # Display progress
          for item in response.output:
              if item.type == "reasoning" and hasattr(item, "summary"):
                  for summary in item.summary:
                      if hasattr(summary, "text"):
                          print(f"💭 {summary.text}")
              elif item.type == "text" and hasattr(item, "text"):
                  print(f"💬 {item.text}")
          
          # Get computer actions from response
          actions = [item for item in response.output if item.type == "computer_call"]
          
          # If no actions, task is complete
          if not actions:
              print("✓ Task completed")
              break
              
          # Execute the action
          action = actions[0]
          print(f"→ {action.action.type}")
          
          execute_action(computer, action.action)
          time.sleep(1)  # Allow UI to update
          
          # Capture screenshot and continue
          screenshot = computer.screenshot_base64()
          
          response = client.responses.create(
              model="computer-use-preview",
              previous_response_id=response.id,  # Link to previous response
              tools=[{
                  "type": "computer_use_preview",
                  "display_width": 1024,
                  "display_height": 768,
                  "environment": "linux"
              }],
              input=[{
                  "call_id": action.call_id,
                  "type": "computer_call_output",
                  "output": {
                      "type": "input_image",
                      "image_url": f"data:image/png;base64,{screenshot}"
                  }
              }],
              reasoning={"summary": "concise"},
              truncation="auto"
          )
      
      return computer


  def execute_action(computer, action):
      """Execute computer actions using Orgo."""
      
      match action.type:
          case "click":
              # Handle left/right clicks
              if getattr(action, 'button', 'left') == "right":
                  computer.right_click(action.x, action.y)
              else:
                  computer.left_click(action.x, action.y)
                  
          case "double_click":
              computer.double_click(action.x, action.y)
              
          case "type":
              computer.type(action.text)
              
          case "key" | "keypress":
              # Handle single keys or key combinations
              keys = getattr(action, 'keys', [getattr(action, 'key', [])])
              if len(keys) > 1:
                  # Multiple keys = keyboard shortcut
                  computer.key('+'.join(keys).lower())
              else:
                  # Single key press
                  for key in keys:
                      computer.key(key)
                      
          case "scroll":
              # Convert scroll amount to direction
              scroll_y = getattr(action, 'scroll_y', 0)
              direction = "down" if scroll_y > 0 else "up"
              computer.scroll(direction, abs(scroll_y) // 100)
              
          case "wait":
              computer.wait(getattr(action, 'seconds', 2))
              
          case "screenshot":
              # Screenshot is taken automatically in the loop
              pass


  if __name__ == "__main__":
      # Example usage
      computer = run_computer_task("Open a terminal and list files")
      
      # Always clean up
      computer.destroy()
  ```
</CodeGroup>

## Usage Examples

### Basic Tasks

```python  theme={null}
# Open a browser
computer = run_computer_task("Open Firefox")

# Navigate to a website
computer = run_computer_task("Go to github.com and search for orgo")

# Fill out a form
computer = run_computer_task("Fill out the contact form with test data")

# Always clean up
computer.destroy()
```

### Complex Workflows

```python  theme={null}
# Multi-step task
task = """
1. Open a text editor
2. Write a Python hello world program
3. Save it as hello.py
4. Open a terminal
5. Run the program
"""
computer = run_computer_task(task)
computer.destroy()
```

### Reusing Sessions

```python  theme={null}
# First task
computer = run_computer_task("Open VS Code")
computer_id = computer.computer_id

# Continue in same session
computer = run_computer_task(
    "Create a new Python file", 
    computer_id=computer_id
)

# Clean up when done
computer.destroy()
```

## Key Concepts

### The Agent Loop

OpenAI Computer Use works in a continuous loop:

1. **Request** → Send task to the model
2. **Action** → Model suggests an action (click, type, etc.)
3. **Execute** → Your code executes the action
4. **Screenshot** → Capture the result
5. **Repeat** → Continue until task is complete

### Action Types

| Action         | Description          | Example                    |
| -------------- | -------------------- | -------------------------- |
| `click`        | Click at coordinates | Click button at (100, 200) |
| `double_click` | Double-click         | Open desktop icon          |
| `type`         | Type text            | Enter username             |
| `key`          | Press key(s)         | Press Enter, Ctrl+C        |
| `scroll`       | Scroll page          | Scroll down 3 units        |
| `wait`         | Pause execution      | Wait 2 seconds             |
| `screenshot`   | Take screenshot      | Capture current state      |

### Safety Features

OpenAI includes safety checks to prevent misuse:

```python  theme={null}
# Handle safety checks if they occur
if hasattr(action, 'pending_safety_checks'):
    for check in action.pending_safety_checks:
        print(f"⚠️  Safety check: {check.message}")
        # Acknowledge in next request if proceeding
```

## Best Practices

### 1. Clear Instructions

```python  theme={null}
# ✅ Good - Specific and clear
task = "Open Firefox, go to github.com, and star the orgo repository"

# ❌ Avoid - Too vague
task = "Do some web stuff"
```

### 2. Error Handling

```python  theme={null}
def safe_run_task(task):
    """Run task with error handling."""
    computer = None
    try:
        computer = run_computer_task(task)
        return computer
    except Exception as e:
        print(f"❌ Error: {e}")
        if computer:
            computer.destroy()
        raise
```

### 3. Session Management

```python  theme={null}
# Use context manager pattern
class ComputerSession:
    def __init__(self, task):
        self.task = task
        self.computer = None
        
    def __enter__(self):
        self.computer = run_computer_task(self.task)
        return self.computer
        
    def __exit__(self, *args):
        if self.computer:
            self.computer.destroy()

# Usage
with ComputerSession("Open calculator") as computer:
    print(f"Session ID: {computer.computer_id}")
```

### 4. Timing Considerations

```python  theme={null}
# Add delays for UI updates
time.sleep(1)  # After clicks
time.sleep(2)  # After opening applications
time.sleep(0.5)  # After typing
```

## Comparison with Claude

| Feature     | OpenAI Computer Use    | Claude Computer Use       |
| ----------- | ---------------------- | ------------------------- |
| API         | Responses API          | Messages API              |
| Model       | `computer-use-preview` | `claude-4-sonnet`         |
| Beta Tag    | Built-in               | `computer-use-2025-01-24` |
| Reasoning   | Optional summaries     | Thinking blocks           |
| Environment | Multiple (browser, OS) | Single tool definition    |

## Limitations

* **Beta Status**: Computer Use is in beta and may have unexpected behaviors
* **Rate Limits**: The model has constrained rate limits
* **Accuracy**: \~38% success rate on complex OS tasks
* **Environment**: Best suited for browser-based tasks

## Next Steps

<CardGroup cols={2}>
  <Card title="OpenAI Docs" icon="book" href="https://platform.openai.com/docs/guides/tools-computer-use">
    Official OpenAI Computer Use documentation
  </Card>

  <Card title="Orgo Quickstart" icon="rocket" href="/quickstart">
    Learn more about Orgo's virtual desktops
  </Card>
</CardGroup>


# Introduction
Source: https://docs.orgo.ai/introduction

Desktop infrastructure for AI agents

<img className="block dark:hidden" src="https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-light.png?fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=1e4b648c876ab25b6989202ab329ea82" alt="Orgo Hero Light" data-og-width="2064" width="2064" data-og-height="1104" height="1104" data-path="images/hero-light.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-light.png?w=280&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=c33e221638174e9e6e25eda67ed80bd7 280w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-light.png?w=560&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=8efe49d916301c7df629d8d189282018 560w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-light.png?w=840&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=f18471299acc64084b907a91e31e5bfe 840w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-light.png?w=1100&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=612a058d7b9add6369aa442cd82abf4e 1100w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-light.png?w=1650&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=368f7a30e4f191b821322f2fb714949c 1650w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-light.png?w=2500&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=595f4fbacd3c2181ca8287136ab7809c 2500w" />

<img className="hidden dark:block" src="https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-dark.png?fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=23a79a47db33f9f0b6ba60bebe794a87" alt="Orgo Hero Dark" data-og-width="2064" width="2064" data-og-height="1104" height="1104" data-path="images/hero-dark.png" data-optimize="true" data-opv="3" srcset="https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-dark.png?w=280&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=c3c9d00cc85c4d4ab3f9eb1f1958051b 280w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-dark.png?w=560&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=189f7902a1d3e8d73a4f2f932a285962 560w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-dark.png?w=840&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=0766491a4c6c17e8bd9daa4d5097829f 840w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-dark.png?w=1100&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=ac1c3f656dae3b79286810e838a9d03e 1100w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-dark.png?w=1650&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=38be4fea8d9697d0fb5838fb606918e1 1650w, https://mintcdn.com/orgo/Xu76ZZh83NgGLQJT/images/hero-dark.png?w=2500&fit=max&auto=format&n=Xu76ZZh83NgGLQJT&q=85&s=9f92719406a987d9a15477072a046d59 2500w" />

## Overview

Orgo is desktop infrastructure for AI agents. Launch headless cloud VMs that AI models can control and interact with.

<CardGroup cols={2}>
  <Card title="Quick Start" icon="play" href="https://docs.orgo.ai/quickstart">
    Start using Orgo in under 5 minutes
  </Card>

  <Card title="Create an Account" icon="user-plus" href="https://www.orgo.ai/start">
    Get your API key to start building
  </Card>
</CardGroup>

## What is computer use?

AI computer use is a new capability that enables AI to directly control computers by viewing screens and manipulating interfaces. Companies like Anthropic recently released their first generation of computer use agents (CUAs) that can observe and interact with digital environments like humans do.

Here's a few random X posts that talk about computer use agents:

<div
  className="tweet-grid"
  style={{ 
display: 'grid', 
gridTemplateColumns: 'repeat(2, 1fr)', 
gap: '16px', 
margin: '24px 0'
}}
>
  <div
    className="tweet-container"
    style={{ 
  position: 'relative', 
  minHeight: '200px', 
  border: '1px solid #ebeef0', 
  borderRadius: '12px', 
  overflow: 'hidden'
}}
  >
    <iframe id="tweet-aaron" loading="lazy" src="https://platform.twitter.com/embed/Tweet.html?dnt=false&frame=false&hideCard=false&hideThread=false&id=1867027506286694539&lang=en&theme=light" style={{ position: 'absolute', top: 0, left: 0, width: '100%', height: '100%' }} frameBorder="0" scrolling="no" title="Aaron Levie Tweet" />
  </div>

  <div
    className="tweet-container"
    style={{ 
  position: 'relative', 
  minHeight: '200px', 
  border: '1px solid #ebeef0', 
  borderRadius: '12px', 
  overflow: 'hidden'
}}
  >
    <iframe id="tweet-yc" loading="lazy" src="https://platform.twitter.com/embed/Tweet.html?dnt=false&frame=false&hideCard=false&hideThread=false&id=1865049865207914855&lang=en&theme=light" style={{ position: 'absolute', top: 0, left: 0, width: '100%', height: '100%' }} frameBorder="0" scrolling="no" title="Y Combinator Tweet" />
  </div>

  <div
    className="tweet-container"
    style={{ 
  position: 'relative', 
  minHeight: '200px', 
  border: '1px solid #ebeef0', 
  borderRadius: '12px', 
  overflow: 'hidden'
}}
  >
    <iframe id="tweet-dharmesh" loading="lazy" src="https://platform.twitter.com/embed/Tweet.html?dnt=false&frame=false&hideCard=false&hideThread=false&id=1848852609328706046&lang=en&theme=light" style={{ position: 'absolute', top: 0, left: 0, width: '100%', height: '100%' }} frameBorder="0" scrolling="no" title="Dharmesh Shah Tweet" />
  </div>

  <div
    className="tweet-container"
    style={{ 
  position: 'relative', 
  minHeight: '200px', 
  border: '1px solid #ebeef0', 
  borderRadius: '12px', 
  overflow: 'hidden'
}}
  >
    <iframe id="tweet-trung" loading="lazy" src="https://platform.twitter.com/embed/Tweet.html?dnt=false&frame=false&hideCard=false&hideThread=false&id=1920560930464669705&lang=en&theme=light" style={{ position: 'absolute', top: 0, left: 0, width: '100%', height: '100%' }} frameBorder="0" scrolling="no" title="Trung Vu Tweet" />
  </div>
</div>

<style
  dangerouslySetInnerHTML={{ __html: `
@media (max-width: 480px) {
  .tweet-grid {
    grid-template-columns: 1fr !important;
  }
}
`}}
/>

<script
  dangerouslySetInnerHTML={{ __html: `
// Handle tweet height messages
window.addEventListener('message', function(e) {
  const message = e.data;
  if (typeof message !== 'object' || message === null || !message.height) return;
  
  const tweetContainers = document.querySelectorAll('iframe[id^="tweet-"]');
  tweetContainers.forEach(container => {
    if (container.contentWindow === e.source) {
      const containerHeight = parseInt(message.height, 10) + 10;
      container.parentNode.style.height = containerHeight + 'px';
    }
  });
});
`}}
/>

This technology opens possibilities for a new category of AI-powered software and tools. Currently, these systems score around 63% accuracy on benchmarks like OSWorld (compared to human performance of \~72%), showing promise but indicating the technology is still maturing.

Running these agents directly on your own machine works for experimentation but lacks security controls, scalability, and production-ready infrastructure. Orgo solves these problems.

## How to start

Here's a quick video of your first computer actions on Orgo. It shows the basic steps or essentially the 'Hello world' of computer use. You use your Orgo API key and Anthropic API key to spin up a new computer instance and then prompt it to open up Firefox.

<iframe width="100%" height="400" src="https://www.youtube.com/embed/JTbgxry--Fk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen style={{ maxWidth: '100%', borderRadius: '8px' }} />

The video walks through a basic computer use agent, but you could make modifications and integrate your own code to make it better or more specialized for your use case.

<CodeGroup>
  ```python computer.py icon="python" theme={null}
  # Connect to a virtual computer
  computer = Computer()

  # Perform actions
  computer.left_click(100, 200)
  computer.type("Hello world")
  computer.screenshot()

  # Manage computer lifecycle
  computer.stop()    # Stop the computer
  computer.start()   # Start it again
  computer.restart() # Or restart it
  computer.destroy() # Permanently delete it
  ```

  ```typescript computer.ts icon="square-js" theme={null}
  // Connect to a virtual computer
  const computer = await Computer.create();

  // Perform actions
  await computer.leftClick(100, 200);
  await computer.type("Hello world");
  await computer.screenshot();

  // Manage computer lifecycle
  await computer.stop();    // Stop the computer
  await computer.start();   // Start it again
  await computer.restart(); // Or restart it
  await computer.destroy(); // Permanently delete it
  ```
</CodeGroup>

Use natural language with `prompt()` to let Claude control the computer:

<CodeGroup>
  ```python prompt.py icon="python" theme={null}
  # Let Claude control the computer with natural language
  computer.prompt("Find a funny cat image and save it to the desktop")
  ```

  ```typescript prompt.ts icon="square-js" theme={null}
  // Let Claude control the computer with natural language
  await computer.prompt({
    instruction: "Find a funny cat image and save it to the desktop"
  });
  ```
</CodeGroup>

The `prompt()` method uses Anthropic Claude by default, but you can also control the computer programmatically using methods such as `leftClick()`, `key()`, and `scroll()`. This flexibility allows you to build custom AI agents that can interact with desktop environments. Orgo uses a bring-your-own-model approach, so you can integrate AI models from providers like Anthropic, OpenAI, or others to control the virtual environments.

Computer use agents typically operate in a loop.

<div style={{ width: '100%', maxWidth: '100%' }}>
  ```mermaid  theme={null}
  %%{init: {'theme':'neutral', 'themeVariables': { 'primaryColor': '#f0f0f0', 'primaryTextColor': '#5a5a5a', 'primaryBorderColor': '#cccccc', 'lineColor': '#5a5a5a', 'fontSize': '14px'}}}%%
  flowchart LR
      A["🔍 See"] --> B["🧠 Decide"] 
      B --> C["👆 Act"]
      C --> A
  ```
</div>

Orgo provides virtual computer environments that AI agents can control to automate tasks. These environments boot in under 500ms, eliminating infrastructure management overhead so you can focus on building.


# Quickstart
Source: https://docs.orgo.ai/quickstart

Speedrun Orgo setup in minutes

## Getting Started

<Steps>
  <Step title="Create your local project">
    Set up a new directory and environment for your project:

    <CodeGroup>
      ```bash Python theme={null}
      # Create a new directory
      mkdir my-orgo-project
      cd my-orgo-project

      # Create a virtual environment
      python -m venv venv

      # Activate the virtual environment
      # On macOS/Linux:
      source venv/bin/activate
      # On Windows:
      # venv\Scripts\activate

      # Install the SDK
      pip install orgo
      ```

      ```bash Node.js theme={null}
      # Create a new directory
      mkdir my-orgo-project
      cd my-orgo-project

      # Initialize a new Node.js project
      npm init -y

      # Install the SDK
      npm install orgo
      ```

      ```bash Yarn theme={null}
      # Create a new directory
      mkdir my-orgo-project
      cd my-orgo-project

      # Initialize a new project
      yarn init -y

      # Install the SDK
      yarn add orgo
      ```

      ```bash pnpm theme={null}
      # Create a new directory
      mkdir my-orgo-project
      cd my-orgo-project

      # Initialize a new project
      pnpm init

      # Install the SDK
      pnpm add orgo
      ```
    </CodeGroup>
  </Step>

  <Step title="Set up your environment">
    Add your API key to your environment variables:

    ```bash icon="terminal" theme={null}
    # In your terminal
    export ORGO_API_KEY=your_api_key

    # Or in a '.env' file
    ORGO_API_KEY=your_api_key
    ```

    <Tip>
      Don't have an API key? [Create an account](https://www.orgo.ai/start) to get one.
    </Tip>
  </Step>

  <Step title="Connect to a computer">
    Create a connection to a virtual computer:

    <CodeGroup>
      ```python connect.py icon="python" theme={null}
      from orgo import Computer

      # Start a new computer
      computer = Computer()  # Uses 'ORGO_API_KEY' environment variable

      # Or connect to an existing one
      computer = Computer(computer_id="computer_id")
      ```

      ```typescript connect.ts icon="square-js" theme={null}
      import { Computer } from 'orgo';

      // Start a new computer
      const computer = await Computer.create();  // Uses 'ORGO_API_KEY' environment variable

      // Or connect to an existing one
      const computer = await Computer.create({ computerId: "computer_id" });
      ```
    </CodeGroup>
  </Step>

  <Step title="Perform actions">
    Control the computer with simple commands:

    <CodeGroup>
      ```python actions.py icon="python" theme={null}
      # Mouse actions
      computer.left_click(100, 200)
      computer.right_click(300, 400)
      computer.double_click(500, 300)

      # Keyboard input
      computer.type("Hello world")
      computer.key("Enter")

      # Take a screenshot
      screenshot = computer.screenshot()

      # Manage computer state
      computer.stop()    # Stop the computer
      computer.start()   # Start it again
      computer.restart() # Restart the computer
      ```

      ```typescript actions.ts icon="square-js" theme={null}
      // Mouse actions
      await computer.leftClick(100, 200);
      await computer.rightClick(300, 400);
      await computer.doubleClick(500, 300);

      // Keyboard input
      await computer.type("Hello world");
      await computer.key("Enter");

      // Take a screenshot
      const screenshot = await computer.screenshot();

      // Manage computer state
      await computer.stop();    // Stop the computer
      await computer.start();   // Start it again
      await computer.restart(); // Restart the computer
      ```
    </CodeGroup>

    <Info>
      Computers automatically stop during inactivity, so manual lifecycle management is optional. The `stop()` and `start()` methods are provided for flexibility when you need explicit control.
    </Info>
  </Step>

  <Step title="Use natural language with Claude">
    To use the `prompt()` method with Claude, first install the Anthropic SDK:

    <CodeGroup>
      ```bash pip theme={null}
      pip install anthropic
      ```

      ```bash npm theme={null}
      npm install @anthropic-ai/sdk
      ```

      ```bash yarn theme={null}
      yarn add @anthropic-ai/sdk
      ```

      ```bash pnpm theme={null}
      pnpm add @anthropic-ai/sdk
      ```
    </CodeGroup>

    Then set your Anthropic API key and use natural language to control the computer:

    <CodeGroup>
      ```python claude.py icon="python" theme={null}
      # Set the Anthropic API key as an environment variable
      import os
      os.environ["ANTHROPIC_API_KEY"] = "your_anthropic_api_key"

      # Or set it in your .env file:
      # ANTHROPIC_API_KEY=your_anthropic_api_key

      # Control computer with natural language
      computer.prompt("Open Firefox and search for 'Anthropic Claude'")
      ```

      ```typescript claude.ts icon="square-js" theme={null}
      // Set the Anthropic API key as an environment variable
      process.env.ANTHROPIC_API_KEY = "your_anthropic_api_key";

      // Or set it in your .env file:
      // ANTHROPIC_API_KEY=your_anthropic_api_key

      // Control computer with natural language
      await computer.prompt({
        instruction: "Open Firefox and search for 'Anthropic Claude'"
      });
      ```
    </CodeGroup>

    <Tip>
      The `prompt()` method handles all the agent loop interactions with Claude automatically. Make sure you have the `anthropic` package installed and your `ANTHROPIC_API_KEY` environment variable set.
    </Tip>
  </Step>
</Steps>

## Complete Example

Here's a full example of using Orgo with Anthropic Claude:

<CodeGroup>
  ```python example.py expandable icon="python" theme={null}
  import os
  from orgo import Computer

  # Set API keys in environment variables
  os.environ["ORGO_API_KEY"] = "your_orgo_api_key"
  os.environ["ANTHROPIC_API_KEY"] = "your_anthropic_api_key"

  # Initialize computer
  computer = Computer()

  try:
      # Define a callback to track progress (optional)
      def progress_callback(event_type, event_data):
          if event_type == "text":
              print(f"Claude: {event_data}")
          elif event_type == "tool_use":
              print(f"Action: {event_data['action']}")
          elif event_type == "error":
              print(f"Error: {event_data}")
      
      # Let Claude control the computer
      messages = computer.prompt(
          "Open Firefox, go to anthropic.com, and take a screenshot of the homepage",
          callback=progress_callback,  # Optional
          model="claude-sonnet-4-20250514",  # Optional, this is the default
          thinking_enabled=True,  # Optional, shows Claude's reasoning (Claude 3.7+)
          max_iterations=10  # Optional, default is 20
      )
      
      print("Task complete!")
      
      # You can access the conversation history in the returned messages
      for msg in messages:
          if msg["role"] == "assistant":
              for content_block in msg["content"]:
                  if content_block.type == "thinking":
                      print(f"Claude's thinking: {content_block.thinking}")
  finally:
      # Clean up
      computer.destroy()
  ```

  ```typescript example.ts expandable icon="square-js" theme={null}
  import { Computer } from 'orgo';

  // Set API keys in environment variables
  process.env.ORGO_API_KEY = "your_orgo_api_key";
  process.env.ANTHROPIC_API_KEY = "your_anthropic_api_key";

  // Initialize computer
  const computer = await Computer.create();

  try {
      // Define a callback to track progress (optional)
      const progressCallback = (type: string, data: any) => {
          if (type === "text") {
              console.log(`Claude: ${data}`);
          } else if (type === "tool_use") {
              console.log(`Action: ${data.action}`);
          } else if (type === "error") {
              console.log(`Error: ${data}`);
          }
      };
      
      // Let Claude control the computer
      const messages = await computer.prompt({
          instruction: "Open Firefox, go to anthropic.com, and take a screenshot of the homepage",
          callback: progressCallback,  // Optional
          model: "claude-sonnet-4-20250514",  // Optional, this is the default
          thinkingEnabled: true,  // Optional, shows Claude's reasoning (Claude 3.7+)
          maxIterations: 10  // Optional, default is 20
      });
      
      console.log("Task complete!");
      
      // You can access the conversation history in the returned messages
      for (const msg of messages) {
          if (msg.role === "assistant") {
              const content = msg.content as any[];
              for (const contentBlock of content) {
                  if (contentBlock.type === "thinking") {
                      console.log(`Claude's thinking: ${contentBlock.thinking}`);
                  }
              }
          }
      }
  } finally {
      // Clean up
      await computer.destroy();
  }
  ```
</CodeGroup>

<Tip>
  Remember to call `destroy()` when finished to free up your project slot. The free tier allows 2 concurrent desktops.
</Tip>

## Manual Agent Loop

If you need more control over the agent loop or want to create a custom implementation, you can use the lower-level APIs:

<CodeGroup>
  ```python manual-loop.py expandable icon="python" theme={null}
  import os
  import anthropic
  from orgo import Computer

  # Set up API keys
  os.environ["ORGO_API_KEY"] = "your_orgo_api_key"
  api_key = "your_anthropic_api_key"

  # Initialize components
  computer = Computer()
  client = anthropic.Anthropic(api_key=api_key)

  try:
      # Initial request
      messages = [{"role": "user", "content": "Open Firefox and go to anthropic.com"}]
      
      # Define the computer tool
      tools = [
          {
              "type": "computer_20250124",
              "name": "computer", 
              "display_width_px": 1024,
              "display_height_px": 768,
              "display_number": 1
          }
      ]
      
      # Start the conversation
      response = client.beta.messages.create(
          model="claude-sonnet-4-20250514",
          messages=messages,
          tools=tools,
          betas=["computer-use-2025-01-24"],
          max_tokens=4096
      )
      
      # Add Claude's response to the conversation
      messages.append({"role": "assistant", "content": response.content})
      
      # Loop until Claude doesn't request any more tools
      while True:
          # Check if Claude used any tools
          tool_results = []
          
          for block in response.content:
              if block.type == "tool_use":
                  # Get the tool parameters
                  action = block.input.get("action")
                  
                  # Execute the tool action
                  result = None
                  if action == "screenshot":
                      response_data = computer.screenshot_base64()
                      result = {
                          "type": "image",
                          "source": {
                              "type": "base64",
                              "media_type": "image/jpeg",
                              "data": response_data
                          }
                      }
                  elif action == "left_click":
                      x, y = block.input.get("coordinate", [0, 0])
                      computer.left_click(x, y)
                      result = {"type": "text", "text": f"Left click at ({x}, {y}) successful"}
                  elif action == "type":
                      text = block.input.get("text", "")
                      computer.type(text)
                      result = {"type": "text", "text": f"Typed: {text}"}
                  
                  # Format the result for Claude
                  tool_results.append({
                      "type": "tool_result",
                      "tool_use_id": block.id,
                      "content": [result] if result else [{"type": "text", "text": "Action completed"}]
                  })
          
          # If no tools were used, Claude is done
          if not tool_results:
              break
              
          # Send tool results back to Claude
          messages.append({"role": "user", "content": tool_results})
          
          # Get Claude's next response
          response = client.beta.messages.create(
              model="claude-sonnet-4-20250514",
              messages=messages,
              tools=tools,
              betas=["computer-use-2025-01-24"],
              max_tokens=4096
          )
          
          # Add Claude's response to the conversation
          messages.append({"role": "assistant", "content": response.content})
      
      print("Task complete!")
      
  finally:
      # Clean up
      computer.destroy()
  ```

  ```typescript manual-loop.ts expandable icon="square-js" theme={null}
  import { Computer } from 'orgo';
  import Anthropic from '@anthropic-ai/sdk';

  // Set up API keys
  process.env.ORGO_API_KEY = "your_orgo_api_key";
  const apiKey = "your_anthropic_api_key";

  // Initialize components
  const computer = await Computer.create();
  const client = new Anthropic({ apiKey });

  try {
      // Initial request
      const messages = [
          { role: "user", content: "Open Firefox and go to anthropic.com" }
      ];
      
      // Define the computer tool
      const tools = [
          {
              type: "computer_20250124",
              name: "computer",
              display_width_px: 1024,
              display_height_px: 768,
              display_number: 1
          }
      ];
      
      // Start the conversation
      let response = await client.beta.messages.create({
          model: "claude-sonnet-4-20250514",
          messages: messages as any,
          tools: tools as any,
          betas: ["computer-use-2025-01-24"],
          max_tokens: 4096
      });
      
      // Add Claude's response to the conversation
      messages.push({ role: "assistant", content: response.content as any });
      
      // Loop until Claude doesn't request any more tools
      while (true) {
          // Check if Claude used any tools
          const toolResults = [];
          
          for (const block of response.content) {
              if (block.type === "tool_use") {
                  // Get the tool parameters
                  const action = block.input.action;
                  
                  // Execute the tool action
                  let result = null;
                  if (action === "screenshot") {
                      const responseData = await computer.screenshotBase64();
                      result = {
                          type: "image",
                          source: {
                              type: "base64",
                              media_type: "image/jpeg",
                              data: responseData
                          }
                      };
                  } else if (action === "left_click") {
                      const [x, y] = block.input.coordinate || [0, 0];
                      await computer.leftClick(x, y);
                      result = { type: "text", text: `Left click at (${x}, ${y}) successful` };
                  } else if (action === "type") {
                      const text = block.input.text || "";
                      await computer.type(text);
                      result = { type: "text", text: `Typed: ${text}` };
                  }
                  
                  // Format the result for Claude
                  toolResults.push({
                      type: "tool_result",
                      tool_use_id: block.id,
                      content: result ? [result] : [{ type: "text", text: "Action completed" }]
                  });
              }
          }
          
          // If no tools were used, Claude is done
          if (toolResults.length === 0) {
              break;
          }
          
          // Send tool results back to Claude
          messages.push({ role: "user", content: toolResults as any });
          
          // Get Claude's next response
          response = await client.beta.messages.create({
              model: "claude-sonnet-4-20250514",
              messages: messages as any,
              tools: tools as any,
              betas: ["computer-use-2025-01-24"],
              max_tokens: 4096
          });
          
          // Add Claude's response to the conversation
          messages.push({ role: "assistant", content: response.content as any });
      }
      
      console.log("Task complete!");
      
  } finally {
      // Clean up
      await computer.destroy();
  }
  ```
</CodeGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="API Reference" icon="code" href="/api-reference/introduction">
    Explore all available methods
  </Card>

  <Card title="Claude Computer Use Guide" icon="book" href="/guides/claude-computer-use">
    Learn more about Claude integration
  </Card>
</CardGroup>