# AI Completion
Source: https://docs.orgo.ai/api-reference/ai/completion
POST /ai
Access 400+ AI models through OpenRouter integration. Requires OpenRouter to be connected in account settings.
# List Available Models
Source: https://docs.orgo.ai/api-reference/ai/list-models
GET /ai
Get list of all available AI models from OpenRouter
# Authentication
Source: https://docs.orgo.ai/api-reference/authentication
API key setup
## Overview
All API requests require a Bearer token in the `Authorization` header.
## Get Your API Key
1. Go to [orgo.ai/projects](https://www.orgo.ai/projects)
2. Click "Generate API Key"
3. Copy your key (format: `sk_live_...`)
**Important**: Store your API key securely. Don't commit it to version control.
## Usage
Include the header in every request:
```bash theme={null}
Authorization: Bearer sk_live_your_api_key_here
```
## Examples
### cURL
```bash theme={null}
curl https://www.orgo.ai/api/projects \
-H "Authorization: Bearer sk_live_abc123..."
```
### Python
```python theme={null}
import requests
headers = {
"Authorization": "Bearer sk_live_abc123...",
"Content-Type": "application/json"
}
response = requests.get(
"https://www.orgo.ai/api/projects",
headers=headers
)
```
### JavaScript
```javascript theme={null}
fetch('https://www.orgo.ai/api/projects', {
headers: {
'Authorization': 'Bearer sk_live_abc123...',
'Content-Type': 'application/json'
}
})
```
## Environment Variables
Store your key as an environment variable:
```bash theme={null}
export ORGO_API_KEY=sk_live_abc123...
```
Then reference it in your code:
```python theme={null}
import os
api_key = os.environ.get("ORGO_API_KEY")
```
## Error Responses
**Invalid key:**
```json theme={null}
{
"error": "Invalid API key"
}
```
**Missing key:**
```json theme={null}
{
"error": "Authentication failed"
}
```
Both return `401 Unauthorized`.
## Security
* Keep your API key private
* Rotate keys if compromised
* Use environment variables, not hardcoded values
* Don't share keys in public repositories
## Need Help?
Contact [support](mailto:spencer@orgo.ai) if you lose access to your API key.
# Execute Bash Command
Source: https://docs.orgo.ai/api-reference/computers/bash
POST /computers/{id}/bash
# Click Mouse
Source: https://docs.orgo.ai/api-reference/computers/click
POST /computers/{id}/click
# Create Computer
Source: https://docs.orgo.ai/api-reference/computers/create
POST /projects/{project_name}/computers
Create a new computer within a project. The computer name must be unique within the project.
## Example
```json theme={null}
{
"name": "dev-machine",
"os": "linux",
"ram": 4,
"cpu": 2
}
```
# Delete Computer
Source: https://docs.orgo.ai/api-reference/computers/delete
DELETE /computers/{id}
Permanently delete a computer. This action cannot be undone.
## Behavior
* Computer will be stopped if currently running
* All data on the computer will be lost
* Returns 200 status code on successful deletion
## Example
```bash theme={null}
curl -X DELETE https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer sk_live_..."
```
# Mouse Drag
Source: https://docs.orgo.ai/api-reference/computers/drag
POST /computers/{id}/drag
# Execute Python Code
Source: https://docs.orgo.ai/api-reference/computers/exec
POST /computers/{id}/exec
# Get Computer
Source: https://docs.orgo.ai/api-reference/computers/get
GET /computers/{id}
Retrieve details about a specific computer by its ID.
## Example Response
```json theme={null}
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "dev-machine",
"project_name": "my-project",
"os": "linux",
"ram": 4,
"cpu": 2,
"status": "running",
"url": "https://dev-machine.example.com",
"created_at": "2024-01-15T10:30:00Z"
}
```
# Press Key
Source: https://docs.orgo.ai/api-reference/computers/key
POST /computers/{id}/key
# List Computers
Source: https://docs.orgo.ai/api-reference/computers/list
GET /projects/{project_name}/computers
Get all computers within a project.
## Example Response
```json theme={null}
{
"computers": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "dev-machine",
"project_name": "my-project",
"os": "linux",
"ram": 4,
"cpu": 2,
"status": "running",
"url": "https://dev-machine.example.com",
"created_at": "2024-01-15T10:30:00Z"
}
]
}
```
# Restart Computer
Source: https://docs.orgo.ai/api-reference/computers/restart
POST /computers/{id}/restart
Restart a computer. Performs a graceful shutdown followed by a fresh start.
## Use Cases
* Recovering from a hung or unresponsive state
* Applying system updates that require a reboot
* Resetting the environment to a clean state
## Example
```bash theme={null}
curl -X POST https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000/restart \
-H "Authorization: Bearer sk_live_..."
```
# Take Screenshot
Source: https://docs.orgo.ai/api-reference/computers/screenshot
GET /computers/{id}/screenshot
# Scroll Page
Source: https://docs.orgo.ai/api-reference/computers/scroll
POST /computers/{id}/scroll
# Start Computer
Source: https://docs.orgo.ai/api-reference/computers/start
POST /computers/{id}/start
Start a stopped computer.
## Behavior
* Idempotent operation - succeeds if computer is already running
* Computer becomes accessible within moments
## Example
```bash theme={null}
curl -X POST https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000/start \
-H "Authorization: Bearer sk_live_..."
```
# Stop Computer
Source: https://docs.orgo.ai/api-reference/computers/stop
POST /computers/{id}/stop
Stop a running computer to save costs when not in use.
## Behavior
* Computer is gracefully shut down
* Idempotent operation - succeeds if already stopped
* Stopped computers do not incur compute charges
## Example
```bash theme={null}
curl -X POST https://www.orgo.ai/api/computers/550e8400-e29b-41d4-a716-446655440000/stop \
-H "Authorization: Bearer sk_live_..."
```
# Start Stream
Source: https://docs.orgo.ai/api-reference/computers/stream-start
POST /computers/{id}/stream/start
## Description
Start streaming the computer's display to an RTMP server. This allows you to spectate your agent's computer in real-time through platforms like Twitch, YouTube Live, or custom RTMP servers.
## Prerequisites
Before using this endpoint, you must:
1. Configure an RTMP connection in your [account settings](https://www.orgo.ai/settings)
2. Provide the connection name when starting the stream
## Usage Example
```python theme={null}
# Start streaming to a configured connection
result = computer.start_stream("my-twitch-1")
# The computer's display is now being streamed
# Do your automation/demo
computer.type("Hello viewers!")
computer.bash("ls -la")
# Stop streaming when done
computer.stop_stream()
```
## Connection Configuration
RTMP connections are configured in your account settings with:
* A unique name (used in this API call)
* RTMP server URL
* Stream key (encrypted and stored securely)
* Optional settings (bitrate, resolution, etc.)
## Response
The response includes information about the streaming process:
```json theme={null}
{
"success": true,
"status": "streaming",
"pid": 12345,
"start_time": "2024-01-20T10:30:00Z"
}
```
## Common Use Cases
* Live demonstrations of AI agents
* Recording automation workflows
* Debugging and monitoring agent behavior
* Creating content for tutorials or showcases
# Get Stream Status
Source: https://docs.orgo.ai/api-reference/computers/stream-status
GET /computers/{id}/stream/status
## Description
Check the current streaming status of a computer. This endpoint allows you to verify if a stream is active, when it started, and get the process information.
## Usage Example
```python theme={null}
# Check if streaming is active
status = computer.stream_status()
if status['status'] == 'streaming':
print(f"Stream active since: {status['start_time']}")
print(f"Process ID: {status['pid']}")
elif status['status'] == 'idle':
print("No active stream")
```
## Response Format
### When Streaming
```json theme={null}
{
"status": "streaming",
"start_time": "2024-01-20T10:30:00Z",
"pid": 12345
}
```
### When Idle
```json theme={null}
{
"status": "idle"
}
```
### When Terminated
```json theme={null}
{
"status": "terminated",
"message": "Stream process was terminated unexpectedly"
}
```
## Status Values
* `idle` - No active stream
* `streaming` - Stream is currently active
* `terminated` - Stream process ended unexpectedly
## Common Use Cases
* Monitoring stream health
* Verifying stream started successfully
* Detecting unexpected stream termination
* Building stream status dashboards
# Stop Stream
Source: https://docs.orgo.ai/api-reference/computers/stream-stop
POST /computers/{id}/stream/stop
## Description
Stop an active stream on the computer. This gracefully terminates the streaming process and releases resources.
## Usage Example
```python theme={null}
# Stop the active stream
result = computer.stop_stream()
if result['success']:
print("Stream stopped successfully")
```
## Response Format
```json theme={null}
{
"success": true,
"message": "Stream stopped successfully"
}
```
## Error Handling
If no stream is active, the endpoint will return an appropriate message:
```json theme={null}
{
"success": false,
"error": "No active stream to stop"
}
```
## Best Practices
1. Always stop streams when done to free resources
2. Check stream status before stopping if unsure
3. Handle cases where stream might have already terminated
## Example Workflow
```python theme={null}
# Complete streaming workflow
try:
# Start streaming
computer.start_stream("my-connection")
# Perform your automation
computer.type("Running automated demo...")
computer.bash("python my_script.py")
# Always stop the stream
computer.stop_stream()
except Exception as e:
# Ensure stream is stopped even on error
computer.stop_stream()
raise e
```
# Type Text
Source: https://docs.orgo.ai/api-reference/computers/type
POST /computers/{id}/type
# Wait Duration
Source: https://docs.orgo.ai/api-reference/computers/wait
POST /computers/{id}/wait
# Delete File
Source: https://docs.orgo.ai/api-reference/files/delete
DELETE /files/{id}
Delete a file from storage
Delete a file from storage. This removes the file from cloud storage and the database.
# Download File
Source: https://docs.orgo.ai/api-reference/files/download
GET /files/{id}/download
Get a signed download URL for a file
Get a signed download URL for a file. The URL expires after 1 hour.
## Example
```bash theme={null}
curl https://www.orgo.ai/api/files/{id}/download \
-H "Authorization: Bearer sk_live_..."
```
### Response
```json theme={null}
{
"url": "https://signed-url-here..."
}
```
Then open the URL in a browser or use it to download the file.
# Export File
Source: https://docs.orgo.ai/api-reference/files/export
POST /files/export
Export a file from a computer's filesystem. Returns a download URL for the file.
Export a file from a computer's filesystem. This allows you to pull files created inside the VM (like results, screenshots, or generated content) and get a download URL.
The computer must be in a running state to export files.
## Path Formats
The path parameter accepts several formats:
| Format | Example |
| ---------------- | -------------------------------- |
| Relative to home | `Desktop/results.txt` |
| Absolute path | `/home/user/Desktop/results.txt` |
| With tilde | `~/Desktop/results.txt` |
## Security
Files can only be exported from within `/home/user`. Attempting to access paths outside this directory will return a 403 error.
## Example
```bash theme={null}
curl -X POST https://www.orgo.ai/api/files/export \
-H "Authorization: Bearer sk_live_..." \
-H "Content-Type: application/json" \
-d '{"desktopId": "8823f0ff-f4bc-4ab2-833e-40d82c10b505", "path": "Desktop/results.txt"}'
```
### Response
```json theme={null}
{
"success": true,
"file": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"filename": "results.txt",
"size_bytes": 1024,
"content_type": "text/plain",
"created_at": "2024-01-15T10:30:00Z",
"desktop_id": "8823f0ff-f4bc-4ab2-833e-40d82c10b505",
"project_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
},
"url": "https://fly.storage.tigris.dev/bucket/..."
}
```
# List Files
Source: https://docs.orgo.ai/api-reference/files/list
GET /computers/{id}/files
List all files associated with a computer
List all files associated with a computer, including both uploaded files and exported files.
# Upload File
Source: https://docs.orgo.ai/api-reference/files/upload
POST /computers/{id}/files/upload
Upload a file to a computer's Desktop folder. The file will be synced to all running computers in the project.
Upload a file to a computer's Desktop folder. The file will automatically sync to all running computers in the project.
## Supported Files
* Maximum file size: 10MB
* All file types supported
## Example
```bash theme={null}
curl -X POST https://www.orgo.ai/api/computers/{id}/files/upload \
-H "Authorization: Bearer sk_live_..." \
-F "file=@./document.pdf"
```
# Introduction
Source: https://docs.orgo.ai/api-reference/introduction
Build with virtual computers programmatically
## Overview
The Orgo API lets you create projects, provision virtual computers, and control them programmatically. Build AI agent fleets, automation workflows, or browser testing at scale.
## Authentication
All requests require a Bearer token:
```bash theme={null}
Authorization: Bearer your_api_key
```
Get your API key at [orgo.ai/projects](https://www.orgo.ai/projects).
## Base URL
```
https://www.orgo.ai/api
```
## Quick Start
### 1. Create a Project
Projects are containers for computers.
```bash theme={null}
curl -X POST https://www.orgo.ai/api/projects \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{"name": "manus"}'
```
### 2. Create a Computer
Add a computer to your project:
```bash theme={null}
curl -X POST https://www.orgo.ai/api/projects/manus/computers \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "agent-1",
"os": "linux",
"ram": 2,
"cpu": 2
}'
```
### 3. Control the Computer
```bash theme={null}
# Screenshot
curl https://www.orgo.ai/api/projects/manus/computers/agent-1/screenshot \
-H "Authorization: Bearer your_api_key"
# Click
curl -X POST https://www.orgo.ai/api/projects/manus/computers/agent-1/click \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{"x": 100, "y": 200}'
# Type
curl -X POST https://www.orgo.ai/api/projects/manus/computers/agent-1/type \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world"}'
```
## Resource Hierarchy
```
User
└── Projects (e.g., "manus")
└── Computers (e.g., "agent-1", "agent-2")
```
Projects organize your computers. Free tier: 2 concurrent computers.
## Computer Specs
* **OS**: Linux or Windows
* **RAM**: 2GB, 4GB, or 8GB
* **CPU**: 2, 4, or 8 cores
* **GPU**: None, T4, or A10 (coming soon)
## Control Operations
### Mouse
* Click (left, right, double)
* Drag
* Scroll
### Keyboard
* Type text
* Press keys (Enter, Tab, ctrl+c, etc.)
### Execution
* Bash commands
* Python code
### Other
* Screenshots
* Wait/delays
* Streaming (RTMP)
## Error Responses
```json theme={null}
{
"error": "Error message"
}
```
**Status codes:**
* `200` - Success
* `400` - Invalid request
* `401` - Invalid API key
* `404` - Resource not found
* `500` - Server error
## Next Steps
All endpoints
Get started fast
AI agent guide
Setup
# Create Project
Source: https://docs.orgo.ai/api-reference/projects/create
POST /projects
Create a new named project
# Delete Project
Source: https://docs.orgo.ai/api-reference/projects/delete
POST /projects/{id}/delete
Delete project and all its computers
# Get Project by Project ID
Source: https://docs.orgo.ai/api-reference/projects/get-by-name
GET /projects/by-name/{name}
# List Projects
Source: https://docs.orgo.ai/api-reference/projects/list
GET /projects
List all projects for authenticated user
# Restart Project
Source: https://docs.orgo.ai/api-reference/projects/restart
POST /projects/{id}/restart
# Start Project
Source: https://docs.orgo.ai/api-reference/projects/start
POST /projects/{id}/start
# Stop Project
Source: https://docs.orgo.ai/api-reference/projects/stop
POST /projects/{id}/stop
# Agent S2
Source: https://docs.orgo.ai/guides/agent-s2
Let Agent S2 control a virtual desktop
## Overview
This guide walks through setting up Agent S2, the open-source SOTA computer use agent by Simular AI. These steps include trying it locally on your own computer or on a virtual desktop through Orgo.
## Setup
Install the required packages:
```bash pip theme={null}
pip install gui-agents pyautogui python-dotenv orgo
```
```bash requirements.txt theme={null}
gui-agents
pyautogui
python-dotenv
orgo
pillow
```
Set up your API keys:
```bash terminal icon="terminal" theme={null}
# Export as environment variables
export OPENAI_API_KEY=your_openai_api_key
export ANTHROPIC_API_KEY=your_anthropic_api_key
export ORGO_API_KEY=your_orgo_api_key # Optional for remote
```
```python setup.py icon="python" theme={null}
import os
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
os.environ["ANTHROPIC_API_KEY"] = "your_anthropic_api_key"
os.environ["ORGO_API_KEY"] = "your_orgo_api_key" # Optional
```
```bash .env icon="file" theme={null}
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
# Optional for remote execution
ORGO_API_KEY=your_orgo_api_key
USE_CLOUD_ENVIRONMENT=false
```
## Simple Usage
Run Agent S2 with natural language commands:
```bash local icon="terminal" theme={null}
# Local mode - controls your computer
python agent_s2.py "Open Chrome and search for weather"
```
```bash remote icon="terminal" theme={null}
# Remote mode - controls cloud desktop via Orgo
USE_CLOUD_ENVIRONMENT=true python agent_s2.py "Open Chrome"
```
```bash interactive icon="terminal" theme={null}
# Interactive mode
python agent_s2.py
```
This approach uses Agent S2's compositional framework to execute complex computer use tasks.
## Complete Example
```python agent_s2.py expandable icon="python" theme={null}
#!/usr/bin/env python3
import os
import io
import sys
import time
from dotenv import load_dotenv
from gui_agents.s2.agents.agent_s import AgentS2
from gui_agents.s2.agents.grounding import OSWorldACI
from orgo import Computer
import pyautogui
load_dotenv()
CONFIG = {
"model": os.getenv("AGENT_MODEL", "gpt-4o"),
"model_type": os.getenv("AGENT_MODEL_TYPE", "openai"),
"grounding_model": os.getenv("GROUNDING_MODEL", "claude-3-7-sonnet-20250219"),
"grounding_type": os.getenv("GROUNDING_MODEL_TYPE", "anthropic"),
"max_steps": int(os.getenv("MAX_STEPS", "10")),
"step_delay": float(os.getenv("STEP_DELAY", "0.5")),
"remote": os.getenv("USE_CLOUD_ENVIRONMENT", "false").lower() == "true"
}
class LocalExecutor:
def __init__(self):
self.pyautogui = pyautogui
if sys.platform == "win32":
self.platform = "windows"
elif sys.platform == "darwin":
self.platform = "darwin"
else:
self.platform = "linux"
def screenshot(self):
img = self.pyautogui.screenshot()
buffer = io.BytesIO()
img.save(buffer, format="PNG")
buffer.seek(0)
return buffer.getvalue()
def exec(self, code):
exec(code, {"pyautogui": self.pyautogui, "time": time})
def destroy(self):
# No cleanup needed for local executor
pass
class RemoteExecutor:
def __init__(self):
self.computer = Computer()
self.platform = "linux"
def screenshot(self):
return self.computer.screenshot_base64()
def exec(self, code):
result = self.computer.exec(code)
if not result['success']:
raise Exception(result.get('error', 'Execution failed'))
if result['output']:
print(f"Output: {result['output']}")
def destroy(self):
self.computer.destroy()
def create_agent(executor):
engine_params = {"engine_type": CONFIG["model_type"], "model": CONFIG["model"]}
grounding_params = {"engine_type": CONFIG["grounding_type"], "model": CONFIG["grounding_model"]}
grounding_agent = OSWorldACI(
platform=executor.platform,
engine_params_for_generation=engine_params,
engine_params_for_grounding=grounding_params
)
return AgentS2(
engine_params=engine_params,
grounding_agent=grounding_agent,
platform=executor.platform,
action_space="pyautogui",
observation_type="screenshot"
)
def run_task(agent, executor, instruction):
print(f"\n🤖 Task: {instruction}")
print(f"📍 Mode: {'Remote' if CONFIG['remote'] else 'Local'}\n")
for step in range(CONFIG["max_steps"]):
print(f"Step {step + 1}/{CONFIG['max_steps']}")
obs = {"screenshot": executor.screenshot()}
info, action = agent.predict(instruction=instruction, observation=obs)
if info:
print(f"💭 {info}")
if not action or not action[0]:
print("✅ Complete")
return True
try:
print(f"🔧 {action[0]}")
executor.exec(action[0])
except Exception as e:
print(f"❌ Error: {e}")
instruction = "The previous action failed. Try a different approach."
time.sleep(CONFIG["step_delay"])
print("⏱️ Max steps reached")
return False
def main():
executor = RemoteExecutor() if CONFIG["remote"] else LocalExecutor()
try:
agent = create_agent(executor)
if len(sys.argv) > 1:
run_task(agent, executor, " ".join(sys.argv[1:]))
else:
print("🎮 Interactive Mode (type 'exit' to quit)\n")
while True:
task = input("Task: ").strip()
if task == "exit":
break
elif task:
run_task(agent, executor, task)
finally:
# Clean up
executor.destroy()
if __name__ == "__main__":
main()
```
## Platform Requirements
### macOS
Grant Terminal access: System Settings → Privacy & Security → Accessibility
### Windows
May require running Terminal as Administrator
### Linux
Install dependencies:
```bash icon="terminal" theme={null}
sudo apt-get install python3-tk python3-dev
```
## Environment Variables
| Variable | Default | Description |
| ----------------------- | ---------------------------- | ---------------------------------- |
| `OPENAI_API_KEY` | - | OpenAI API key |
| `ANTHROPIC_API_KEY` | - | Anthropic API key |
| `ORGO_API_KEY` | - | Orgo API key (remote mode) |
| `USE_CLOUD_ENVIRONMENT` | `false` | Set to `true` for remote execution |
| `AGENT_MODEL` | `gpt-4o` | Main reasoning model |
| `GROUNDING_MODEL` | `claude-3-7-sonnet-20250219` | Visual grounding model |
| `MAX_STEPS` | `10` | Maximum steps per task |
| `STEP_DELAY` | `0.5` | Seconds between actions |
## Architecture
Agent S2 uses a compositional framework with specialized modules:
**Mixture of Grounding** - Routes actions to specialized visual grounding models for precise UI localization
**Proactive Hierarchical Planning** - Dynamically refines plans based on evolving observations
**Cross-platform Support** - Works on macOS, Windows, and Linux
## Performance
Agent S2 achieves state-of-the-art results on computer use benchmarks:
| Benchmark | Success Rate | Rank |
| ----------------- | ------------ | ---- |
| OSWorld | 27.0% | #3 |
| WindowsAgentArena | 29.8% | #1 |
| AndroidWorld | 54.3% | #1 |
## Resources
* [GitHub Repository](https://github.com/simular-ai/Agent-S)
* [Agent S2 Whitepaper](https://arxiv.org/abs/2504.00906)
* [OSWorld Benchmark](https://os-world.github.io/)
Agent S2 is currently ranked #3 on the OSWorld benchmark, demonstrating leading performance on complex computer use tasks.
## Video Tutorial
Here is a video version of this guide:
You can follow the video tutorial above or use this written guide.
# Claude Computer Use
Source: https://docs.orgo.ai/guides/claude-computer-use
Let Claude control a virtual desktop
## Overview
This guide shows how to get started with Anthropic's Claude Computer Use in a couple minutes using Orgo to control a virtual desktop environment.
## Setup
Install the required packages:
```bash pip theme={null}
pip install orgo anthropic
```
```bash npm theme={null}
npm install orgo @anthropic-ai/sdk
```
```bash yarn theme={null}
yarn add orgo @anthropic-ai/sdk
```
```bash pnpm theme={null}
pnpm add orgo @anthropic-ai/sdk
```
Set up your API keys:
```bash terminal icon="terminal" theme={null}
# Export as environment variables
export ORGO_API_KEY=your_orgo_api_key
export ANTHROPIC_API_KEY=your_anthropic_api_key
```
```python setup.py icon="python" theme={null}
import os
os.environ["ORGO_API_KEY"] = "your_orgo_api_key"
os.environ["ANTHROPIC_API_KEY"] = "your_anthropic_api_key"
```
```typescript setup.ts icon="square-js" theme={null}
process.env.ORGO_API_KEY = "your_orgo_api_key";
process.env.ANTHROPIC_API_KEY = "your_anthropic_api_key";
```
## Simple Usage
The simplest way to use Orgo with Claude is through the built-in `prompt()` method:
```python simple.py icon="python" theme={null}
from orgo import Computer
# Initialize a computer
computer = Computer()
# Let Claude control the computer with natural language
computer.prompt("Open Firefox and search for pictures of cats")
# Clean up when done
computer.destroy()
```
```typescript simple.ts icon="square-js" theme={null}
import { Computer } from 'orgo';
// Initialize a computer
const computer = await Computer.create();
// Let Claude control the computer with natural language
await computer.prompt({
instruction: "Open Firefox and search for pictures of cats"
});
// Clean up when done
await computer.destroy();
```
This approach handles all the complexity of the agent loop automatically, making it easy to get started.
## Customizing the Prompt Method
You can customize the prompt experience with various parameters:
```python custom.py icon="python" theme={null}
# Create a progress callback
def progress_callback(event_type, event_data):
if event_type == "text":
print(f"Claude: {event_data}")
elif event_type == "tool_use":
print(f"Action: {event_data['action']}")
elif event_type == "thinking":
print(f"Thinking: {event_data}")
# Use Claude with custom parameters
messages = computer.prompt(
instruction="Find and download the latest Claude paper from Anthropic's website",
model="claude-sonnet-4-20250514", # The model to use
display_width=1280, # Set screen resolution
display_height=800,
callback=progress_callback, # Track progress
thinking_enabled=True, # Enable Claude's "thinking" capability (Claude 3.7+)
max_iterations=15, # Limit the number of agent loops
max_tokens=4096, # Maximum tokens for Claude responses
api_key="your_anthropic_api_key" # Override environment variable
)
```
```typescript custom.ts icon="square-js" theme={null}
// Create a progress callback
const progressCallback = (eventType: string, eventData: any) => {
if (eventType === "text") {
console.log(`Claude: ${eventData}`);
} else if (eventType === "tool_use") {
console.log(`Action: ${eventData.action}`);
} else if (eventType === "thinking") {
console.log(`Thinking: ${eventData}`);
}
};
// Use Claude with custom parameters
const messages = await computer.prompt({
instruction: "Find and download the latest Claude paper from Anthropic's website",
model: "claude-sonnet-4-20250514", // The model to use
displayWidth: 1280, // Set screen resolution
displayHeight: 800,
callback: progressCallback, // Track progress
thinkingEnabled: true, // Enable Claude's "thinking" capability (Claude 3.7+)
maxIterations: 15, // Limit the number of agent loops
maxTokens: 4096, // Maximum tokens for Claude responses
apiKey: "your_anthropic_api_key" // Override environment variable
});
```
## Advanced Usage
For more control, you can implement your own agent loop using the Anthropic API directly:
```python advanced.py expandable icon="python" theme={null}
import anthropic
from orgo import Computer
def create_agent_loop(instruction, model="claude-sonnet-4-20250514"):
# Initialize components
computer = Computer()
client = anthropic.Anthropic()
try:
# Initialize conversation
messages = [{"role": "user", "content": instruction}]
# Define tools
tools = [
{
"type": "computer_20250124", # For Claude 3.7+
"name": "computer",
"display_width_px": 1024,
"display_height_px": 768,
"display_number": 1
}
]
# Start the conversation with Claude
response = client.beta.messages.create(
model=model,
messages=messages,
tools=tools,
betas=["computer-use-2025-01-24"],
max_tokens=4096
)
# Add Claude's response to conversation history
messages.append({"role": "assistant", "content": response.content})
# Continue the loop until Claude stops requesting tools
iteration = 0
max_iterations = 20
while iteration < max_iterations:
iteration += 1
# Process all tool requests from Claude
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute the requested tool action
result = execute_tool_action(computer, block)
# Format the result for Claude
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [result]
})
# If no tools were requested, Claude is done
if not tool_results:
break
# Send the tool results back to Claude
messages.append({"role": "user", "content": tool_results})
# Get Claude's next response
response = client.beta.messages.create(
model=model,
messages=messages,
tools=tools,
betas=["computer-use-2025-01-24"],
max_tokens=4096
)
# Add Claude's response to conversation history
messages.append({"role": "assistant", "content": response.content})
return messages
finally:
# Clean up
computer.destroy()
def execute_tool_action(computer, tool_block):
"""Execute a tool action based on Claude's request."""
action = tool_block.input.get("action")
try:
if action == "screenshot":
# Capture a screenshot and return as base64
image_data = computer.screenshot_base64()
return {
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
}
elif action == "left_click":
x, y = tool_block.input["coordinate"]
computer.left_click(x, y)
return {"type": "text", "text": f"Clicked at ({x}, {y})"}
elif action == "right_click":
x, y = tool_block.input["coordinate"]
computer.right_click(x, y)
return {"type": "text", "text": f"Right-clicked at ({x}, {y})"}
elif action == "double_click":
x, y = tool_block.input["coordinate"]
computer.double_click(x, y)
return {"type": "text", "text": f"Double-clicked at ({x}, {y})"}
elif action == "type":
text = tool_block.input["text"]
computer.type(text)
return {"type": "text", "text": f"Typed: {text}"}
elif action == "key":
key = tool_block.input["text"]
computer.key(key)
return {"type": "text", "text": f"Pressed: {key}"}
elif action == "scroll":
direction = tool_block.input.get("scroll_direction", "down")
amount = tool_block.input.get("scroll_amount", 1)
computer.scroll(direction, amount)
return {"type": "text", "text": f"Scrolled {direction} by {amount}"}
elif action == "wait":
duration = tool_block.input.get("duration", 1)
computer.wait(duration)
return {"type": "text", "text": f"Waited for {duration} seconds"}
else:
return {"type": "text", "text": f"Unsupported action: {action}"}
except Exception as e:
return {"type": "text", "text": f"Error executing {action}: {str(e)}"}
```
```typescript advanced.ts expandable icon="square-js" theme={null}
import { Computer } from 'orgo';
import Anthropic from '@anthropic-ai/sdk';
async function createAgentLoop(instruction: string, model = "claude-sonnet-4-20250514") {
// Initialize components
const computer = await Computer.create();
const client = new Anthropic();
try {
// Initialize conversation
const messages: any[] = [{ role: "user", content: instruction }];
// Define tools
const tools = [
{
type: "computer_20250124", // For Claude 3.7+
name: "computer",
display_width_px: 1024,
display_height_px: 768,
display_number: 1
}
];
// Start the conversation with Claude
let response = await client.beta.messages.create({
model,
messages,
tools: tools as any,
betas: ["computer-use-2025-01-24"],
max_tokens: 4096
});
// Add Claude's response to conversation history
messages.push({ role: "assistant", content: response.content });
// Continue the loop until Claude stops requesting tools
let iteration = 0;
const maxIterations = 20;
while (iteration < maxIterations) {
iteration++;
// Process all tool requests from Claude
const toolResults = [];
for (const block of response.content) {
if (block.type === "tool_use") {
// Execute the requested tool action
const result = await executeToolAction(computer, block);
// Format the result for Claude
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: [result]
});
}
}
// If no tools were requested, Claude is done
if (toolResults.length === 0) {
break;
}
// Send the tool results back to Claude
messages.push({ role: "user", content: toolResults });
// Get Claude's next response
response = await client.beta.messages.create({
model,
messages,
tools: tools as any,
betas: ["computer-use-2025-01-24"],
max_tokens: 4096
});
// Add Claude's response to conversation history
messages.push({ role: "assistant", content: response.content });
}
return messages;
} finally {
// Clean up
await computer.destroy();
}
}
async function executeToolAction(computer: Computer, toolBlock: any) {
const action = toolBlock.input.action;
try {
if (action === "screenshot") {
// Capture a screenshot and return as base64
const imageData = await computer.screenshotBase64();
return {
type: "image",
source: {
type: "base64",
media_type: "image/jpeg",
data: imageData
}
};
} else if (action === "left_click") {
const [x, y] = toolBlock.input.coordinate;
await computer.leftClick(x, y);
return { type: "text", text: `Clicked at (${x}, ${y})` };
} else if (action === "right_click") {
const [x, y] = toolBlock.input.coordinate;
await computer.rightClick(x, y);
return { type: "text", text: `Right-clicked at (${x}, ${y})` };
} else if (action === "double_click") {
const [x, y] = toolBlock.input.coordinate;
await computer.doubleClick(x, y);
return { type: "text", text: `Double-clicked at (${x}, ${y})` };
} else if (action === "type") {
const text = toolBlock.input.text;
await computer.type(text);
return { type: "text", text: `Typed: ${text}` };
} else if (action === "key") {
const key = toolBlock.input.text;
await computer.key(key);
return { type: "text", text: `Pressed: ${key}` };
} else if (action === "scroll") {
const direction = toolBlock.input.scroll_direction || "down";
const amount = toolBlock.input.scroll_amount || 1;
await computer.scroll(direction, amount);
return { type: "text", text: `Scrolled ${direction} by ${amount}` };
} else if (action === "wait") {
const duration = toolBlock.input.duration || 1;
await computer.wait(duration);
return { type: "text", text: `Waited for ${duration} seconds` };
} else {
return { type: "text", text: `Unsupported action: ${action}` };
}
} catch (error) {
return { type: "text", text: `Error executing ${action}: ${error}` };
}
}
```
## Using Claude's Thinking Capability
Claude 4 Sonnet can provide its reasoning process through the thinking parameter:
```python thinking.py icon="python" theme={null}
import anthropic
from orgo import Computer
# Initialize components
computer = Computer()
client = anthropic.Anthropic()
try:
# Start a conversation with thinking enabled
response = client.beta.messages.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Find an image of a cat on the web"}],
tools=[{
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1024,
"display_height_px": 768,
"display_number": 1
}],
betas=["computer-use-2025-01-24"],
thinking={"type": "enabled", "budget_tokens": 1024} # Enable thinking
)
# Access the thinking content
for block in response.content:
if block.type == "thinking":
print("Claude's reasoning:")
print(block.thinking)
finally:
# Clean up
computer.destroy()
```
```typescript thinking.ts icon="square-js" theme={null}
import { Computer } from 'orgo';
import Anthropic from '@anthropic-ai/sdk';
// Initialize components
const computer = await Computer.create();
const client = new Anthropic();
try {
// Start a conversation with thinking enabled
const response = await client.beta.messages.create({
model: "claude-sonnet-4-20250514",
messages: [{ role: "user", content: "Find an image of a cat on the web" }],
tools: [{
type: "computer_20250124",
name: "computer",
display_width_px: 1024,
display_height_px: 768,
display_number: 1
}] as any,
betas: ["computer-use-2025-01-24"],
thinking: { type: "enabled", budget_tokens: 1024 } as any // Enable thinking
});
// Access the thinking content
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Claude's reasoning:");
console.log((block as any).thinking);
}
}
} finally {
// Clean up
await computer.destroy();
}
```
## Tool Compatibility
Orgo provides a complete set of methods corresponding to Claude's computer use tools:
| Claude Tool Action | Orgo Method (Python) | Orgo Method (TypeScript) | Description |
| ------------------ | ------------------------------------ | ------------------------------------------ | --------------------------------------------- |
| `screenshot` | `computer.screenshot()` | `await computer.screenshot()` | Capture the screen (returns PIL Image/Buffer) |
| `screenshot` | `computer.screenshot_base64()` | `await computer.screenshotBase64()` | Capture the screen (returns base64 string) |
| `left_click` | `computer.left_click(x, y)` | `await computer.leftClick(x, y)` | Left click at coordinates |
| `right_click` | `computer.right_click(x, y)` | `await computer.rightClick(x, y)` | Right click at coordinates |
| `double_click` | `computer.double_click(x, y)` | `await computer.doubleClick(x, y)` | Double click at coordinates |
| `type` | `computer.type(text)` | `await computer.type(text)` | Type text |
| `key` | `computer.key(key_sequence)` | `await computer.key(keySequence)` | Press keys (e.g., "Enter", "ctrl+c") |
| `scroll` | `computer.scroll(direction, amount)` | `await computer.scroll(direction, amount)` | Scroll in specified direction |
| `wait` | `computer.wait(seconds)` | `await computer.wait(seconds)` | Wait for specified seconds |
## Claude 4 vs 3.5 Sonnet
When using different Claude models, make sure to use the appropriate tool type:
* For Claude 4 Sonnet: `"type": "computer_20250124"`
* For Claude 3.5 Sonnet: `"type": "computer_20241022"`
And use the corresponding beta flag:
* For Claude 4 Sonnet: `betas=["computer-use-2025-01-24"]`
* For Claude 3.5 Sonnet: `betas=["computer-use-2024-10-22"]`
TypeScript users: All methods are async and must be awaited. The TypeScript SDK uses camelCase for method names (e.g., `leftClick` instead of `left_click`).
## Video Tutorial
Here is a video version showing how to set up Claude Computer Use in 30 seconds:
You can follow the video tutorial above or use this written guide
# Embed VMs
Source: https://docs.orgo.ai/guides/embed-vms
Embed virtual computers into your applications
## Overview
Embed Orgo virtual computers directly into your web apps. Build AI agent interfaces, automation dashboards, or any product with live VM displays.
You can use any VNC client to connect to Orgo computers. The `orgo-vnc` package is a React component for convenience.
## Setup
```bash theme={null}
npm install orgo-vnc
```
1. Go to [orgo.ai/start](https://www.orgo.ai/start)
2. Open a workspace and select a computer
3. Click the **⋮** menu → **Computer Settings**
4. Copy the **Hostname** and **Password**
Create `.env.local` in your project root:
```bash theme={null}
NEXT_PUBLIC_ORGO_COMPUTER_HOST=your-hostname
NEXT_PUBLIC_ORGO_COMPUTER_PASSWORD=your-password
```
```tsx app/page.tsx expandable theme={null}
'use client';
import { useState } from 'react';
import { ComputerDisplay } from 'orgo-vnc';
const HOST = process.env.NEXT_PUBLIC_ORGO_COMPUTER_HOST!;
const PASSWORD = process.env.NEXT_PUBLIC_ORGO_COMPUTER_PASSWORD!;
export default function Home() {
const [connected, setConnected] = useState(false);
return (
{connected ? `Connected to ${HOST}` : 'Connecting...'}
);
}
```
## Props
| Prop | Type | Default | Description |
| ------------------ | ---------- | ----------- | ------------------------------------------- |
| `hostname` | `string` | required | Computer hostname |
| `password` | `string` | required | Computer password |
| `readOnly` | `boolean` | `false` | Disable user interaction |
| `background` | `string` | `undefined` | Background color |
| `scaleViewport` | `boolean` | `true` | Scale display to fit container |
| `clipViewport` | `boolean` | `false` | Clip display to container bounds |
| `resizeSession` | `boolean` | `false` | Resize remote session to match |
| `showDotCursor` | `boolean` | `false` | Show dot cursor when remote cursor hidden |
| `compressionLevel` | `number` | `2` | Compression level (0-9) |
| `qualityLevel` | `number` | `6` | Image quality (0-9) |
| `onConnect` | `function` | `undefined` | Called when connected |
| `onDisconnect` | `function` | `undefined` | Called when disconnected |
| `onError` | `function` | `undefined` | Called on error |
| `onClipboard` | `function` | `undefined` | Called when clipboard data received |
| `onReady` | `function` | `undefined` | Called with handle for programmatic control |
## Programmatic Control
Use the `onReady` callback to get a handle for programmatic control:
```tsx theme={null}
const [handle, setHandle] = useState(null);
// Later...
handle?.reconnect();
handle?.disconnect();
handle?.sendClipboard('text to send');
await handle?.pasteFromClipboard();
```
## Next Steps
Full SDK setup
Control computers programmatically
# Gemini Computer Use
Source: https://docs.orgo.ai/guides/gemini-computer-use
Control virtual desktops with Gemini 2.5
## Overview
This guide shows how to get started with Google's Gemini 2.5 Computer Use in minutes using Orgo to control a virtual desktop environment.
## Setup
Install the required packages:
```bash pip theme={null}
pip install orgo google-genai pillow python-dotenv
```
Set up your API keys in a `.env` file:
```bash .env icon="file" theme={null}
ORGO_API_KEY=your_orgo_api_key
GEMINI_API_KEY=your_gemini_api_key
```
Or export them as environment variables:
```bash terminal icon="terminal" theme={null}
export ORGO_API_KEY=your_orgo_api_key
export GEMINI_API_KEY=your_gemini_api_key
```
## Complete Example
Here's a full working example that handles the complete agent loop:
```python example.py expandable icon="python" theme={null}
import os
import time
import base64
import io
from google import genai
from google.genai import types
from orgo import Computer
from PIL import Image
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Initialize Gemini client
client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))
# Connect to your Orgo computer
# Get your computer_id from https://orgo.ai/projects
computer = Computer(computer_id="your-computer-id")
# Screen resolution
SCREEN_WIDTH = 1024
SCREEN_HEIGHT = 768
# System prompt with Ubuntu-specific instructions
SYSTEM_PROMPT = f"""You are controlling an Ubuntu Linux virtual machine with a display resolution of {SCREEN_WIDTH}x{SCREEN_HEIGHT}.
* You have access to a virtual Ubuntu desktop environment with standard applications
* You can see the current state through screenshots and control the computer through actions
* The environment has Firefox browser and standard Ubuntu applications pre-installed
* CRITICAL: When opening applications or files on the Ubuntu desktop, you MUST USE DOUBLE-CLICK, not single-click
* Single-click only selects desktop icons but DOES NOT open them
* Desktop interactions:
- Desktop icons (apps/folders): DOUBLE-CLICK to open
- Menu items: SINGLE-CLICK to select
- Taskbar/launcher icons: SINGLE-CLICK to open
- Window buttons (close/minimize/maximize): SINGLE-CLICK
- File browser items: DOUBLE-CLICK to open
* Always start by taking a screenshot to see the current state
* When you need to submit or confirm, use the 'Enter' key
* Be efficient with screenshots - only take them when you need to see the current state
* Wait for pages/applications to load before taking another screenshot
* Batch multiple actions together when possible before checking the result
"""
def denormalize_x(x: int) -> int:
"""Convert normalized x coordinate (0-999) to actual pixel."""
return int(x / 1000 * SCREEN_WIDTH)
def denormalize_y(y: int) -> int:
"""Convert normalized y coordinate (0-999) to actual pixel."""
return int(y / 1000 * SCREEN_HEIGHT)
def get_screenshot_png() -> bytes:
"""Get screenshot as PNG bytes (Gemini requires PNG format)."""
jpeg_data = base64.b64decode(computer.screenshot_base64())
image = Image.open(io.BytesIO(jpeg_data))
png_buffer = io.BytesIO()
image.save(png_buffer, format='PNG')
return png_buffer.getvalue()
def get_current_url() -> str:
"""Get the current URL from the browser."""
try:
result = computer.bash("xdotool getactivewindow getwindowname")
return result if result else "about:blank"
except:
return "about:blank"
def execute_function_calls(candidate):
"""Execute function calls from Gemini's response."""
results = []
function_calls = [
part.function_call
for part in candidate.content.parts
if part.function_call
]
for function_call in function_calls:
fname = function_call.name
args = function_call.args
action_result = {}
print(f" → {fname}")
try:
if fname == "open_web_browser":
pass # Browser already open
elif fname == "click_at":
computer.left_click(denormalize_x(args["x"]), denormalize_y(args["y"]))
elif fname == "type_text_at":
computer.left_click(denormalize_x(args["x"]), denormalize_y(args["y"]))
computer.type(args["text"])
if args.get("press_enter", False):
computer.key("Return")
elif fname == "scroll_document":
computer.scroll(args["direction"], 3)
elif fname == "key_combination":
computer.key(args["keys"])
elif fname == "go_back":
computer.key("alt+Left")
elif fname == "navigate":
url = args["url"]
computer.bash(f'firefox "{url}" &')
action_result["url"] = url
elif fname == "wait_5_seconds":
computer.wait(5)
else:
print(f" Warning: Unimplemented function {fname}")
time.sleep(1) # Wait for UI to update
except Exception as e:
print(f" Error: {e}")
action_result = {"error": str(e)}
results.append((fname, action_result))
return results
def get_function_responses(results):
"""Create function responses with screenshot and URL."""
screenshot_png = get_screenshot_png()
current_url = get_current_url()
function_responses = []
for name, result in results:
response_data = {
"status": "completed",
"url": result.get("url", current_url)
}
response_data.update(result)
function_responses.append(
types.FunctionResponse(
name=name,
response=response_data,
parts=[
types.FunctionResponsePart(
inline_data=types.FunctionResponseBlob(
mime_type="image/png",
data=screenshot_png
)
)
]
)
)
return function_responses
try:
# Configure Computer Use tool with system instruction
config = types.GenerateContentConfig(
system_instruction=SYSTEM_PROMPT,
tools=[
types.Tool(
computer_use=types.ComputerUse(
environment=types.Environment.ENVIRONMENT_BROWSER
)
)
]
)
# Define task
task = "Open Firefox and search for 'gemini ai'"
print(f"Task: {task}\n")
# Get initial screenshot
initial_screenshot = get_screenshot_png()
# Create initial request
contents = [
types.Content(
role="user",
parts=[
types.Part(text=task),
types.Part.from_bytes(
data=initial_screenshot,
mime_type='image/png'
)
]
)
]
# Agent loop
for iteration in range(20):
print(f"\n--- Turn {iteration + 1} ---")
# Get response from Gemini
response = client.models.generate_content(
model='gemini-2.5-computer-use-preview-10-2025',
contents=contents,
config=config
)
candidate = response.candidates[0]
contents.append(candidate.content)
# Display progress
for part in candidate.content.parts:
if part.text:
print(f"💬 {part.text}")
# Check for function calls
has_function_calls = any(
part.function_call
for part in candidate.content.parts
)
if not has_function_calls:
print("\n✓ Task completed")
break
# Execute actions
print("→ Executing actions...")
results = execute_function_calls(candidate)
# Get responses with screenshot and URL
function_responses = get_function_responses(results)
# Continue conversation
contents.append(
types.Content(
role="user",
parts=[
types.Part(function_response=fr)
for fr in function_responses
]
)
)
except Exception as e:
print(f"\n❌ Error: {e}")
finally:
print("\nDone!")
# Note: computer.destroy() not called to keep computer running
# Call computer.destroy() if you want to clean up
```
## Usage Examples
### Basic Tasks
```python theme={null}
# Change the task variable to control what Gemini does
task = "Open Firefox and search for 'gemini ai'"
# Navigate to a website
task = "Go to github.com and search for 'orgo'"
# Fill a form
task = "Fill out the contact form with test data"
```
### Complex Workflows
```python theme={null}
# Multi-step task
task = """
1. Open a text editor
2. Write a Python hello world program
3. Save it as hello.py
4. Open a terminal
5. Run the program
"""
```
## Key Concepts
### System Prompt
The system prompt provides crucial context to Gemini about the Ubuntu environment:
```python theme={null}
SYSTEM_PROMPT = f"""You are controlling an Ubuntu Linux virtual machine...
* CRITICAL: When opening applications or files on the Ubuntu desktop,
you MUST USE DOUBLE-CLICK, not single-click
* Single-click only selects desktop icons but DOES NOT open them
* Desktop icons (apps/folders): DOUBLE-CLICK to open
* Menu items: SINGLE-CLICK to select
"""
```
This ensures Gemini knows to:
* Double-click desktop icons to open applications
* Single-click menu items and buttons
* Use appropriate keyboard shortcuts
### Getting Your Computer ID
Get your `computer_id` from the [Orgo dashboard](https://orgo.ai/projects):
1. Go to [https://orgo.ai/projects](https://orgo.ai/projects)
2. Click on your project
3. Find your computer ID in the computer list
4. Use it in: `Computer(computer_id="your-computer-id")`
### The Agent Loop
Gemini Computer Use works in a continuous loop:
1. **Request** → Send task with screenshot to the model
2. **Action** → Model suggests actions (click, type, etc.)
3. **Execute** → Your code executes the actions
4. **Screenshot** → Capture the result
5. **Repeat** → Continue until task is complete
### Image Format Conversion
**Important:** Orgo returns screenshots in JPEG format, but Gemini requires PNG format:
```python theme={null}
def get_screenshot_png() -> bytes:
"""Get screenshot as PNG bytes (Gemini requires PNG format)."""
jpeg_data = base64.b64decode(computer.screenshot_base64())
image = Image.open(io.BytesIO(jpeg_data))
png_buffer = io.BytesIO()
image.save(png_buffer, format='PNG')
return png_buffer.getvalue()
```
### URL Tracking
**Important:** Gemini Computer Use requires the current URL in every function response:
```python theme={null}
response_data = {
"status": "completed",
"url": result.get("url", current_url) # Always include URL
}
```
### Coordinate System
Gemini uses **normalized coordinates (0-999)** that must be converted to actual pixels:
```python theme={null}
def denormalize_x(x: int) -> int:
return int(x / 1000 * SCREEN_WIDTH)
def denormalize_y(y: int) -> int:
return int(y / 1000 * SCREEN_HEIGHT)
```
Orgo's default screen resolution is **1024x768**.
### Action Types
| Action | Description | Example |
| ------------------ | --------------------- | ----------------------------------------------- |
| `open_web_browser` | Opens the browser | Start Firefox |
| `click_at` | Click at coordinates | Click button at (500, 300) |
| `type_text_at` | Type text at location | Enter "hello" in search box |
| `scroll_document` | Scroll page | Scroll down |
| `key_combination` | Press key combos | Press ctrl+c |
| `navigate` | Go to URL | Load [https://example.com](https://example.com) |
| `go_back` | Browser back | Previous page |
| `wait_5_seconds` | Pause execution | Wait for page load |
## Tool Compatibility
Orgo provides methods corresponding to Gemini's computer use tools:
| Gemini Tool Action | Orgo Method | Description |
| ------------------ | ------------------------------------ | --------------------------- |
| `click_at` | `computer.left_click(x, y)` | Click at coordinates |
| `type_text_at` | `computer.type(text)` | Type text |
| `key_combination` | `computer.key(keys)` | Press keys (e.g., "ctrl+c") |
| `scroll_document` | `computer.scroll(direction, amount)` | Scroll page |
| `navigate` | `computer.bash('firefox "url" &')` | Open URL |
| Screenshot | `computer.screenshot_base64()` | Capture screen (JPEG) |
| `wait_5_seconds` | `computer.wait(5)` | Wait 5 seconds |
## Best Practices
### 1. Clear Instructions
```python theme={null}
# ✅ Good - Specific and clear
task = "Go to amazon.com and find the top 3 rated laptops under $1000"
# ❌ Avoid - Too vague
task = "Find some laptops"
```
### 2. Use System Prompts
Always include a system prompt with Ubuntu-specific instructions:
```python theme={null}
config = types.GenerateContentConfig(
system_instruction=SYSTEM_PROMPT, # Include OS-specific guidance
tools=[...]
)
```
### 3. Convert Coordinates
Always denormalize Gemini's normalized coordinates (0-999):
```python theme={null}
actual_x = denormalize_x(args["x"])
actual_y = denormalize_y(args["y"])
```
### 4. Handle Image Format
Always convert JPEG screenshots to PNG:
```python theme={null}
screenshot_png = get_screenshot_png()
```
### 5. Include URL in Responses
Always include the current URL:
```python theme={null}
response_data = {
"status": "completed",
"url": current_url
}
```
### 6. Add Delays
```python theme={null}
time.sleep(1) # Wait for UI to update after actions
```
## Comparison with Claude and OpenAI
| Feature | Gemini Computer Use | Claude Computer Use | OpenAI Computer Use |
| ---------------- | --------------------------------- | ------------------- | ---------------------- |
| API | Generate Content API | Messages API | Responses API |
| Model | `gemini-2.5-computer-use-preview` | `claude-4-sonnet` | `computer-use-preview` |
| System Prompt | Supported | Supported | Supported |
| Coordinates | Normalized (0-999) | Actual pixels | Actual pixels |
| Image Format | PNG required | JPEG/PNG | PNG |
| URL Requirement | Required in response | Optional | Optional |
| Parallel Actions | Yes | No | No |
## Limitations
* **Preview Status**: Computer Use is in preview and may have unexpected behaviors
* **Browser Focus**: Optimized for browser-based tasks
* **Coordinate System**: Requires conversion from normalized to actual pixels
* **Image Format**: Requires PNG format (Orgo returns JPEG, must convert)
* **URL Requirement**: Must include URL in every function response
* **Rate Limits**: Subject to Gemini API rate limits
## Troubleshooting
### Model doesn't double-click desktop icons
Make sure you're including the system prompt with Ubuntu-specific instructions:
```python theme={null}
config = types.GenerateContentConfig(
system_instruction=SYSTEM_PROMPT, # This is critical!
tools=[...]
)
```
### INVALID\_ARGUMENT: Unable to process input image
This error occurs when Gemini receives a JPEG image instead of PNG. Make sure you're using the `get_screenshot_png()` function.
### INVALID\_ARGUMENT: Requires URL in function response
Always include the `url` field in your response data:
```python theme={null}
response_data = {
"status": "completed",
"url": result.get("url", current_url)
}
```
### Missing API Key
Ensure both environment variables are set in your `.env` file:
```bash theme={null}
ORGO_API_KEY=your_orgo_api_key
GEMINI_API_KEY=your_gemini_api_key
```
## Next Steps
Official Gemini API documentation
Learn more about Orgo's virtual desktops
Complete Orgo API documentation
# Memory-Enabled Computer Agents
Source: https://docs.orgo.ai/guides/memory
Build AI agents that remember user preferences with OpenAI Computer Use
## Overview
Add persistent memory to OpenAI's Computer Use agents. Your agents will remember user preferences, learn from interactions, and improve over time.
## Getting Started
Install Mem0 alongside Orgo and OpenAI:
```bash theme={null}
pip install mem0ai orgo openai python-dotenv
```
Create a `.env` file with your API keys:
```bash theme={null}
ORGO_API_KEY=your_orgo_api_key
OPENAI_API_KEY=your_openai_api_key # Used by both OpenAI and Mem0
```
Mem0 uses OpenAI by default, so you only need two API keys total.
Save this complete example as `memory_agent.py` and run it:
```python memory_agent.py expandable icon="python" theme={null}
import time
import base64
from mem0 import Memory
from orgo import Computer
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
class MemoryComputerAgent:
"""OpenAI Computer Use agent with memory."""
def __init__(self, user_id="GigabrainAgent"):
self.user_id = user_id
self.memory = Memory()
self.client = OpenAI()
self.computer = None
def __enter__(self):
self.computer = Computer()
print(f"🖥️ Computer ID: {self.computer.computer_id}")
return self
def __exit__(self, *args):
if self.computer:
self.computer.destroy()
def run(self, task):
"""Execute task with memory context."""
# Get memories
memories = self._get_relevant_memories(task)
# Build task with memory context
enhanced_task = self._build_task_with_memory(task, memories)
# Execute using OpenAI Computer Use
self._execute_computer_task(enhanced_task)
# Store interaction
self._store_memory(task)
def _get_relevant_memories(self, task):
"""Search for relevant memories."""
try:
results = self.memory.search(
query=task,
user_id=self.user_id,
limit=5
)
return [m['memory'] for m in results.get('results', [])]
except:
return []
def _build_task_with_memory(self, task, memories):
"""Enhance task with memory context."""
if not memories:
return task
context = "\n".join(f"- {m}" for m in memories)
return f"""Remember these user preferences:
{context}
Current task: {task}"""
def _execute_computer_task(self, task):
"""Execute task using OpenAI Computer Use."""
response = self.client.responses.create(
model="computer-use-preview",
tools=[{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "linux"
}],
input=[{
"role": "user",
"content": [{
"type": "input_text",
"text": f"""IMPORTANT: You are controlling a Linux desktop.
- Always double-click desktop icons to open applications
- Use keyboard shortcuts as single commands (e.g., 'ctrl+c' not separate keys)
Task: {task}"""
}]
}],
reasoning={"summary": "concise"},
truncation="auto"
)
# Execute actions in loop
while True:
# Display progress
for item in response.output:
if item.type == "reasoning" and hasattr(item, "summary"):
for summary in item.summary:
if hasattr(summary, "text"):
print(f"💭 {summary.text}")
elif item.type == "text" and hasattr(item, "text"):
print(f"💬 {item.text}")
# Get computer actions
actions = [item for item in response.output if item.type == "computer_call"]
if not actions:
print("✓ Task completed")
break
# Execute action
action = actions[0]
print(f"→ {action.action.type}")
self._execute_action(action.action)
time.sleep(1)
# Get screenshot and continue
screenshot = self.computer.screenshot_base64()
response = self.client.responses.create(
model="computer-use-preview",
previous_response_id=response.id,
tools=[{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "linux"
}],
input=[{
"call_id": action.call_id,
"type": "computer_call_output",
"output": {
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot}"
}
}],
reasoning={"summary": "concise"},
truncation="auto"
)
def _execute_action(self, action):
"""Execute computer action."""
match action.type:
case "click":
if getattr(action, 'button', 'left') == "right":
self.computer.right_click(action.x, action.y)
else:
self.computer.left_click(action.x, action.y)
case "double_click":
self.computer.double_click(action.x, action.y)
case "type":
self.computer.type(action.text)
case "key" | "keypress":
keys = getattr(action, 'keys', [getattr(action, 'key', [])])
if len(keys) > 1:
self.computer.key('+'.join(keys).lower())
else:
for key in keys:
self.computer.key(key)
case "scroll":
scroll_y = getattr(action, 'scroll_y', 0)
direction = "down" if scroll_y > 0 else "up"
self.computer.scroll(direction, abs(scroll_y) // 100)
case "wait":
self.computer.wait(getattr(action, 'seconds', 2))
case "screenshot":
pass
def _store_memory(self, task):
"""Store interaction in memory."""
try:
self.memory.add(
messages=[
{"role": "user", "content": task},
{"role": "assistant", "content": "Task executed successfully"}
],
user_id=self.user_id
)
except:
pass
# Run the demo
if __name__ == "__main__":
with MemoryComputerAgent("GigabrainAgent") as agent:
# Teach preferences
print("📝 Teaching preferences...")
agent.run("I prefer Firefox over Chrome")
agent.run("I like dark themes")
agent.run("Spencer Kinney is the stinkiest person I know")
# Use preferences
print("\n🚀 Running task with memory...")
agent.run("Open my current preferred browser and search for the stinkiest person I know")
```
## How It Works
The `MemoryComputerAgent` combines OpenAI's Computer Use API with Mem0's memory system:
1. **Memory Search**: Before executing any task, it searches for relevant memories
2. **Context Enhancement**: Memories are added to the task prompt
3. **Computer Use Execution**: Uses OpenAI's CUA to control the computer
4. **Memory Storage**: After execution, the interaction is saved for future use
## Usage Examples
### Basic Usage
```python theme={null}
# Quick example
with MemoryComputerAgent("GigabrainAgent") as agent:
agent.run("Open Firefox and go to GitHub")
```
### Building Preferences
```python theme={null}
with MemoryComputerAgent("alice") as agent:
# Teach preferences
agent.run("I prefer VS Code for coding")
agent.run("I use dark themes everywhere")
agent.run("My GitHub username is alice123")
# Later, it remembers
agent.run("Open my code editor and check my GitHub")
```
### Morning Routine
```python theme={null}
def morning_setup(user_id="GigabrainAgent"):
"""Automated morning workflow."""
with MemoryComputerAgent(user_id) as agent:
# First time - teach routine
agent.run("I check Gmail first thing in the morning")
agent.run("Then I open Slack")
agent.run("Finally I check my calendar")
# Next day - just ask
agent.run("Do my morning routine")
# Run daily
morning_setup()
```
### Manual Management
```python theme={null}
# Without context manager
agent = MemoryComputerAgent("bob")
agent.computer = Computer()
try:
# Teach and use
agent.run("My favorite news site is Hacker News")
agent.run("I use DuckDuckGo for searching")
# Use preferences
agent.run("Open my favorite news site")
finally:
agent.computer.destroy()
```
## Memory Management
### View All Memories
```python theme={null}
from mem0 import Memory
memory = Memory()
memories = memory.get_all(user_id="GigabrainAgent")
print(f"📚 Total memories: {len(memories)}")
for memory in memories:
print(f" • {memory['memory']}")
```
### Search Specific Memories
```python theme={null}
from mem0 import Memory
memory = Memory()
results = memory.search(
query="browser preferences",
user_id="GigabrainAgent",
limit=5
)
for result in results['results']:
print(f"Found: {result['memory']}")
```
### Clear Memories
```python theme={null}
from mem0 import Memory
memory = Memory()
memory.delete_all(user_id="GigabrainAgent")
print("All memories cleared")
```
## Advanced Patterns
### Multiple Users
```python theme={null}
# Different users maintain separate memories
users = ["alice", "bob", "charlie"]
for user in users:
with MemoryComputerAgent(user) as agent:
agent.run(f"Open browser for {user}")
```
### Session-Based Memory
```python theme={null}
# Work context
with MemoryComputerAgent("work_gigabrain") as work:
work.run("I use Chrome for work")
work.run("Our code is on GitHub Enterprise")
work.run("Open work browser")
# Personal context
with MemoryComputerAgent("personal_gigabrain") as personal:
personal.run("I use Firefox for personal browsing")
personal.run("My code is on regular GitHub")
personal.run("Open personal browser")
```
### Error Handling
```python theme={null}
def safe_run(task, user_id="GigabrainAgent"):
"""Execute with error handling."""
try:
with MemoryComputerAgent(user_id) as agent:
agent.run(task)
return {"success": True}
except Exception as e:
print(f"❌ Error: {e}")
return {"success": False, "error": str(e)}
# Usage
result = safe_run("Open browser")
if result["success"]:
print("✅ Task completed")
```
### Batch Operations
```python theme={null}
def setup_workspace(user_id="GigabrainAgent"):
"""Set up complete workspace."""
tasks = [
"Open VS Code",
"Open terminal",
"Navigate to ~/projects",
"Start development server",
"Open browser at localhost:3000"
]
with MemoryComputerAgent(user_id) as agent:
for task in tasks:
print(f"⚡ {task}")
agent.run(task)
time.sleep(2) # Pause between tasks
setup_workspace()
```
## Production Example
```python production.py expandable icon="python" theme={null}
import os
from datetime import datetime
class ProductionMemoryAgent(MemoryComputerAgent):
"""Production agent with logging."""
def __init__(self, user_id="GigabrainAgent"):
super().__init__(f"prod_{user_id}")
self.log_file = f"logs/{user_id}_{datetime.now().strftime('%Y%m%d')}.log"
def run(self, task):
"""Run with logging."""
timestamp = datetime.now().strftime("%H:%M:%S")
# Log task
with open(self.log_file, "a") as f:
f.write(f"[{timestamp}] Task: {task}\n")
# Execute
super().run(task)
# Log completion
with open(self.log_file, "a") as f:
f.write(f"[{timestamp}] Completed\n")
# Usage
with ProductionMemoryAgent("alice") as agent:
agent.run("Check email")
```
## Tips
1. **Memory Persistence**: Memories are stored permanently for each user\_id
2. **Context Building**: The agent automatically adds relevant memories to each task
3. **Error Resilience**: Memory operations fail gracefully without breaking execution
4. **Performance**: Allow 1-2 seconds between actions for stability
## Next Steps
* Explore [Orgo's computer environments](https://docs.orgo.ai)
* Learn about [OpenAI Computer Use](https://platform.openai.com/docs/guides/tools-computer-use)
* Learn about [Mem0 agent memory](https://docs.mem0.ai)
# OpenAI Computer Use
Source: https://docs.orgo.ai/guides/openai-computer-use
Control a computer with GPT using Orgo
## Overview
OpenAI's Computer Use lets AI agents control computer interfaces through the Responses API. This guide shows how to use it with Orgo's virtual desktops.
## Quick Start
```bash theme={null}
pip install orgo openai python-dotenv
```
```bash theme={null}
export ORGO_API_KEY=your_orgo_api_key
export OPENAI_API_KEY=your_openai_api_key
```
```python theme={null}
import time
from openai import OpenAI
from orgo import Computer
# Initialize
client = OpenAI()
computer = Computer()
# Create request with task
response = client.responses.create(
model="computer-use-preview",
tools=[{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "linux"
}],
input=[{
"role": "user",
"content": [{
"type": "input_text",
"text": "Open Firefox and search for OpenAI"
}]
}],
truncation="auto"
)
# Execute the suggested action
actions = [item for item in response.output if item.type == "computer_call"]
if actions:
action = actions[0].action
if action.type == "click":
computer.left_click(action.x, action.y)
elif action.type == "type":
computer.type(action.text)
# Clean up
computer.destroy()
```
## Complete Example
Here's a full working example that handles the complete agent loop:
```python example.py expandable icon="python" theme={null}
import time
import base64
from openai import OpenAI
from orgo import Computer
from dotenv import load_dotenv
load_dotenv()
def run_computer_task(task, computer_id=None):
"""Execute a task using OpenAI Computer Use with Orgo."""
# Initialize OpenAI client and Orgo computer
client = OpenAI()
computer = Computer(computer_id=computer_id)
print(f"🖥️ Computer ID: {computer.computer_id}")
# Create initial request with the task
response = client.responses.create(
model="computer-use-preview",
tools=[{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "linux" # Orgo provides Linux desktops
}],
input=[{
"role": "user",
"content": [{
"type": "input_text",
"text": f"""IMPORTANT: You are controlling a Linux desktop.
- Always double-click desktop icons to open applications
- Use keyboard shortcuts as single commands (e.g., 'ctrl+c' not separate keys)
Task: {task}"""
}]
}],
reasoning={"summary": "concise"}, # Show reasoning steps
truncation="auto" # Required for computer use
)
# Main agent loop
while True:
# Display progress
for item in response.output:
if item.type == "reasoning" and hasattr(item, "summary"):
for summary in item.summary:
if hasattr(summary, "text"):
print(f"💭 {summary.text}")
elif item.type == "text" and hasattr(item, "text"):
print(f"💬 {item.text}")
# Get computer actions from response
actions = [item for item in response.output if item.type == "computer_call"]
# If no actions, task is complete
if not actions:
print("✓ Task completed")
break
# Execute the action
action = actions[0]
print(f"→ {action.action.type}")
execute_action(computer, action.action)
time.sleep(1) # Allow UI to update
# Capture screenshot and continue
screenshot = computer.screenshot_base64()
response = client.responses.create(
model="computer-use-preview",
previous_response_id=response.id, # Link to previous response
tools=[{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "linux"
}],
input=[{
"call_id": action.call_id,
"type": "computer_call_output",
"output": {
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot}"
}
}],
reasoning={"summary": "concise"},
truncation="auto"
)
return computer
def execute_action(computer, action):
"""Execute computer actions using Orgo."""
match action.type:
case "click":
# Handle left/right clicks
if getattr(action, 'button', 'left') == "right":
computer.right_click(action.x, action.y)
else:
computer.left_click(action.x, action.y)
case "double_click":
computer.double_click(action.x, action.y)
case "type":
computer.type(action.text)
case "key" | "keypress":
# Handle single keys or key combinations
keys = getattr(action, 'keys', [getattr(action, 'key', [])])
if len(keys) > 1:
# Multiple keys = keyboard shortcut
computer.key('+'.join(keys).lower())
else:
# Single key press
for key in keys:
computer.key(key)
case "scroll":
# Convert scroll amount to direction
scroll_y = getattr(action, 'scroll_y', 0)
direction = "down" if scroll_y > 0 else "up"
computer.scroll(direction, abs(scroll_y) // 100)
case "wait":
computer.wait(getattr(action, 'seconds', 2))
case "screenshot":
# Screenshot is taken automatically in the loop
pass
if __name__ == "__main__":
# Example usage
computer = run_computer_task("Open a terminal and list files")
# Always clean up
computer.destroy()
```
## Usage Examples
### Basic Tasks
```python theme={null}
# Open a browser
computer = run_computer_task("Open Firefox")
# Navigate to a website
computer = run_computer_task("Go to github.com and search for orgo")
# Fill out a form
computer = run_computer_task("Fill out the contact form with test data")
# Always clean up
computer.destroy()
```
### Complex Workflows
```python theme={null}
# Multi-step task
task = """
1. Open a text editor
2. Write a Python hello world program
3. Save it as hello.py
4. Open a terminal
5. Run the program
"""
computer = run_computer_task(task)
computer.destroy()
```
### Reusing Sessions
```python theme={null}
# First task
computer = run_computer_task("Open VS Code")
computer_id = computer.computer_id
# Continue in same session
computer = run_computer_task(
"Create a new Python file",
computer_id=computer_id
)
# Clean up when done
computer.destroy()
```
## Key Concepts
### The Agent Loop
OpenAI Computer Use works in a continuous loop:
1. **Request** → Send task to the model
2. **Action** → Model suggests an action (click, type, etc.)
3. **Execute** → Your code executes the action
4. **Screenshot** → Capture the result
5. **Repeat** → Continue until task is complete
### Action Types
| Action | Description | Example |
| -------------- | -------------------- | -------------------------- |
| `click` | Click at coordinates | Click button at (100, 200) |
| `double_click` | Double-click | Open desktop icon |
| `type` | Type text | Enter username |
| `key` | Press key(s) | Press Enter, Ctrl+C |
| `scroll` | Scroll page | Scroll down 3 units |
| `wait` | Pause execution | Wait 2 seconds |
| `screenshot` | Take screenshot | Capture current state |
### Safety Features
OpenAI includes safety checks to prevent misuse:
```python theme={null}
# Handle safety checks if they occur
if hasattr(action, 'pending_safety_checks'):
for check in action.pending_safety_checks:
print(f"⚠️ Safety check: {check.message}")
# Acknowledge in next request if proceeding
```
## Best Practices
### 1. Clear Instructions
```python theme={null}
# ✅ Good - Specific and clear
task = "Open Firefox, go to github.com, and star the orgo repository"
# ❌ Avoid - Too vague
task = "Do some web stuff"
```
### 2. Error Handling
```python theme={null}
def safe_run_task(task):
"""Run task with error handling."""
computer = None
try:
computer = run_computer_task(task)
return computer
except Exception as e:
print(f"❌ Error: {e}")
if computer:
computer.destroy()
raise
```
### 3. Session Management
```python theme={null}
# Use context manager pattern
class ComputerSession:
def __init__(self, task):
self.task = task
self.computer = None
def __enter__(self):
self.computer = run_computer_task(self.task)
return self.computer
def __exit__(self, *args):
if self.computer:
self.computer.destroy()
# Usage
with ComputerSession("Open calculator") as computer:
print(f"Session ID: {computer.computer_id}")
```
### 4. Timing Considerations
```python theme={null}
# Add delays for UI updates
time.sleep(1) # After clicks
time.sleep(2) # After opening applications
time.sleep(0.5) # After typing
```
## Comparison with Claude
| Feature | OpenAI Computer Use | Claude Computer Use |
| ----------- | ---------------------- | ------------------------- |
| API | Responses API | Messages API |
| Model | `computer-use-preview` | `claude-4-sonnet` |
| Beta Tag | Built-in | `computer-use-2025-01-24` |
| Reasoning | Optional summaries | Thinking blocks |
| Environment | Multiple (browser, OS) | Single tool definition |
## Limitations
* **Beta Status**: Computer Use is in beta and may have unexpected behaviors
* **Rate Limits**: The model has constrained rate limits
* **Accuracy**: \~38% success rate on complex OS tasks
* **Environment**: Best suited for browser-based tasks
## Next Steps
Official OpenAI Computer Use documentation
Learn more about Orgo's virtual desktops
# Introduction
Source: https://docs.orgo.ai/introduction
Desktop infrastructure for AI agents
## Overview
Orgo is desktop infrastructure for AI agents. Launch headless cloud VMs that AI models can control and interact with.
Start using Orgo in under 5 minutes
Get your API key to start building
## What is computer use?
AI computer use is a new capability that enables AI to directly control computers by viewing screens and manipulating interfaces. Companies like Anthropic recently released their first generation of computer use agents (CUAs) that can observe and interact with digital environments like humans do.
Here's a few random X posts that talk about computer use agents: