Home >Backend Development >Python Tutorial >Getting Started with Vector Search (Part 2)

Getting Started with Vector Search (Part 2)

Linda Hamilton
Linda HamiltonOriginal
2024-11-10 02:07:02657browse

Getting Started with Vector Search (Part 2)

In Part 1, we set up PostgreSQL with pgvector. Now, let's see how vector search actually works.

Contents

  • What are Embeddings?
  • Loading Sample Data
  • Exploring Vector Search
  • Understanding PostgreSQL Operators
  • Next Steps

What are Embeddings?

An embedding is like a smart summary of content in numbers. The distance between two embeddings indicates their level of similarity. A small distance suggests that the vectors are quite similar, and a large distance indicates that they are less related.

? Book A: Web Development  (Distance: 0.2) ⬅️ Very Similar!
? Book B: JavaScript 101   (Distance: 0.3) ⬅️ Similar!
? Book C: Cooking Recipes  (Distance: 0.9) ❌ Not Similar

Loading Sample Data

Now, let's populate our database with some data. We'll use:

  • Open Library API for book data
  • OpenAI API to create embeddings
  • pgvector to store and search them

Project Structure

pgvector-setup/             # From Part 1
  ├── compose.yml
  ├── postgres/
  │   └── schema.sql
  ├── .env                  # New: for API keys
  └── scripts/              # New: for data loading
      ├── requirements.txt
      ├── Dockerfile
      └── load_data.py

Create a Script

Let's start with a script to load data from external APIs. The full script is Here.

Setting Up Data Loading

  1. Create .env:
OPENAI_API_KEY=your_openai_api_key
  1. Update compose.yml to add the data loader:
services:
  # ... existing db service from Part 1

  data_loader:
    build:
      context: ./scripts
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/example_db
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    depends_on:
      - db
  1. Load the data:
docker compose up data_loader

You should see 10 programming books with their metadata.

Exploring Vector Search

Connect to your database:

docker exec -it pgvector-db psql -U postgres -d example_db

Understanding Vector Data

Let's peek at what embeddings actually look like:

-- View first 5 dimensions of an embedding
SELECT
    name,
    (embedding::text::float[])[1:5] as first_5_dimensions
FROM items
LIMIT 1;
  • Each embedding has 1536 dimensions (using OpenAI's model)
  • Values typically range from -1 to 1
  • These numbers represent semantic meaning

Finding Similar Books

Try a simple similarity search:

-- Find 3 books similar to any book about Web
SELECT name, metadata
FROM items
ORDER BY embedding <-> (
    SELECT embedding
    FROM items
    WHERE metadata->>'title' LIKE '%Web%'
    LIMIT 1
)
LIMIT 3;
  1. Find a book with "Web" in its title
  2. Get that book's embedding (its mathematical representation)
  3. Compare this embedding with all other books' embeddings
  4. Get the 3 most similar books (smallest distances)

Understanding PostgreSQL Operators

Let's break down the operators used in vector search queries:

JSON Text Operator: ->>

Extracts text value from a JSON field.

Example:

-- If metadata = {"title": "ABC"}, it returns "ABC"
SELECT metadata->>'title' FROM items;

Vector Distance Operator: <->

Measures similarity between two vectors.

  • Smaller distance = More similar
  • Larger distance = Less similar

Example:

-- Find similar books
SELECT name, embedding <-> query_embedding as distance
FROM items
ORDER BY distance
LIMIT 3;

Next Steps

Up next, we'll:

  • Build a FastAPI application
  • Create search endpoints
  • Make our vector search accessible via API

Stay tuned for Part 3: "Building a Vector Search API"! ?

Feel free to drop a comment below! ?

The above is the detailed content of Getting Started with Vector Search (Part 2). For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn