Home >Web Front-end >JS Tutorial >How to create an AI agent powered by your screen & mic

How to create an AI agent powered by your screen & mic

Linda Hamilton
Linda HamiltonOriginal
2025-01-22 08:35:10553browse

How to create an AI agent powered by your screen & mic

Screenpipe: A CLI/App for 24/7 Screen and Mic Recording, OCR, Transcription, and AI Integration

Screenpipe is a command-line interface (CLI) application that continuously records your screen and microphone activity, extracts Optical Character Recognition (OCR) data, generates transcriptions, and simplifies the process of feeding this data into AI models. Its flexible pipe system allows you to create powerful plugins that interact with captured screen and audio information. This example demonstrates building a simple pipe that leverages Ollama to analyze screen activity.

Prerequisites:

  • Screenpipe installed and running.
  • Bun installed (npm install -g bun).
  • Ollama installed with a model (DeepSeek-r1:1.5b is used in this example).

1. Pipe Creation:

Create a new Screenpipe pipe using the CLI:

<code class="language-bash">bunx @screenpipe/create-pipe@latest</code>

Follow the prompts to name your pipe (e.g., "my-activity-analyzer") and choose a directory.

2. Project Setup:

Open the project in your preferred editor (e.g., Cursor, VS Code):

<code class="language-bash">cursor my-activity-analyzer</code>

The initial project structure will include several files. For this example, remove unnecessary files:

<code class="language-bash">rm -rf src/app/api/intelligence src/components/obsidian-settings.tsx src/components/file-suggest-textarea.tsx</code>

3. Implementing the Analysis Cron Job:

Create src/app/api/analyze/route.ts with the following code:

<code class="language-typescript">import { NextResponse } from "next/server";
import { pipe } from "@screenpipe/js";
import { streamText } from "ai";
import { ollama } from "ollama-ai-provider";

export async function POST(request: Request) {
  try {
    const { messages, model } = await request.json();
    console.log("model:", model);

    const fiveMinutesAgo = new Date(Date.now() - 5 * 60 * 1000).toISOString();
    const results = await pipe.queryScreenpipe({
      startTime: fiveMinutesAgo,
      limit: 10,
      contentType: "all",
    });

    const provider = ollama(model);
    const result = streamText({
      model: provider,
      messages: [
        ...messages,
        {
          role: "user",
          content: `Analyze this activity data and summarize what I've been doing: ${JSON.stringify(results)}`,
        },
      ],
    });

    return result.toDataStreamResponse();
  } catch (error) {
    console.error("error:", error);
    return NextResponse.json({ error: "Failed to analyze activity" }, { status: 500 });
  }
}</code>

4. pipe.json Configuration for Scheduling:

Create or modify pipe.json to include the cron job:

<code class="language-json">{
  "crons": [
    {
      "path": "/api/analyze",
      "schedule": "*/5 * * * *" // Runs every 5 minutes
    }
  ]
}</code>

5. Updating the Main Page (src/app/page.tsx):

<code class="language-typescript">"use client";

import { useState } from "react";
import { Button } from "@/components/ui/button";
import { OllamaModelsList } from "@/components/ollama-models-list";
import { Label } from "@/components/ui/label";
import { useChat } from "ai/react";

export default function Home() {
  const [selectedModel, setSelectedModel] = useState("deepseek-r1:1.5b");
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    body: { model: selectedModel },
    api: "/api/analyze",
  });

  return (
    <main className="p-4 max-w-2xl mx-auto space-y-4">
      <div className="space-y-2">
        <label htmlFor="model">Ollama Model</label>
        <OllamaModelsList defaultValue={selectedModel} onChange={setSelectedModel} />
      </div>

      <div>
        {messages.map((message) => (
          <div key={message.id}>
            <div>{message.role === "user" ? "User: " : "AI: "}</div>
            <div>{message.content}</div>
          </div>
        ))}
      </div>
    </main>
  );
}</code>

6. Local Testing:

Run the pipe locally:

<code class="language-bash">bun i  // or npm install
bun dev</code>

Access the application at http://localhost:3000.

7. Screenpipe Installation:

Install the pipe into Screenpipe:

  • UI: Open the Screenpipe app, navigate to the Pipes section, click " ", and provide the local path to your pipe.
  • CLI:
    <code class="language-bash">screenpipe install /path/to/my-activity-analyzer
    screenpipe enable my-activity-analyzer</code>

How it Works:

  • Data Querying: pipe.queryScreenpipe() retrieves recent screen and audio data.
  • AI Processing: Ollama analyzes the data using a prompt.
  • UI: A simple interface displays the analysis results.
  • Scheduling: Screenpipe's cron job executes the analysis every 5 minutes.

Next Steps:

  • Add configuration options.
  • Integrate with external services.
  • Implement more sophisticated UI components.

References:

  • Screenpipe documentation.
  • Example Screenpipe pipes.
  • Screenpipe SDK reference.

The above is the detailed content of How to create an AI agent powered by your screen & mic. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn