In Part 1, we set up PostgreSQL with pgvector. Now, let's see how vector search actually works.
Contents
- What are Embeddings?
- Loading Sample Data
- Exploring Vector Search
- Understanding PostgreSQL Operators
- Next Steps
What are Embeddings?
An embedding is like a smart summary of content in numbers. The distance between two embeddings indicates their level of similarity. A small distance suggests that the vectors are quite similar, and a large distance indicates that they are less related.
? Book A: Web Development (Distance: 0.2) ⬅️ Very Similar! ? Book B: JavaScript 101 (Distance: 0.3) ⬅️ Similar! ? Book C: Cooking Recipes (Distance: 0.9) ❌ Not Similar
Loading Sample Data
Now, let's populate our database with some data. We'll use:
- Open Library API for book data
- OpenAI API to create embeddings
- pgvector to store and search them
Project Structure
pgvector-setup/ # From Part 1 ├── compose.yml ├── postgres/ │ └── schema.sql ├── .env # New: for API keys └── scripts/ # New: for data loading ├── requirements.txt ├── Dockerfile └── load_data.py
Create a Script
Let's start with a script to load data from external APIs. The full script is Here.
Setting Up Data Loading
- Create .env:
OPENAI_API_KEY=your_openai_api_key
- Update compose.yml to add the data loader:
services: # ... existing db service from Part 1 data_loader: build: context: ./scripts environment: - DATABASE_URL=postgresql://postgres:password@db:5432/example_db - OPENAI_API_KEY=${OPENAI_API_KEY} depends_on: - db
- Load the data:
docker compose up data_loader
You should see 10 programming books with their metadata.
Exploring Vector Search
Connect to your database:
docker exec -it pgvector-db psql -U postgres -d example_db
Understanding Vector Data
Let's peek at what embeddings actually look like:
-- View first 5 dimensions of an embedding SELECT name, (embedding::text::float[])[1:5] as first_5_dimensions FROM items LIMIT 1;
- Each embedding has 1536 dimensions (using OpenAI's model)
- Values typically range from -1 to 1
- These numbers represent semantic meaning
Finding Similar Books
Try a simple similarity search:
-- Find 3 books similar to any book about Web SELECT name, metadata FROM items ORDER BY embedding ( SELECT embedding FROM items WHERE metadata->>'title' LIKE '%Web%' LIMIT 1 ) LIMIT 3;
- Find a book with "Web" in its title
- Get that book's embedding (its mathematical representation)
- Compare this embedding with all other books' embeddings
- Get the 3 most similar books (smallest distances)
Understanding PostgreSQL Operators
Let's break down the operators used in vector search queries:
JSON Text Operator: ->>
Extracts text value from a JSON field.
Example:
-- If metadata = {"title": "ABC"}, it returns "ABC" SELECT metadata->>'title' FROM items;
Vector Distance Operator:
Measures similarity between two vectors.
- Smaller distance = More similar
- Larger distance = Less similar
Example:
-- Find similar books SELECT name, embedding query_embedding as distance FROM items ORDER BY distance LIMIT 3;
Next Steps
Up next, we'll:
- Build a FastAPI application
- Create search endpoints
- Make our vector search accessible via API
Stay tuned for Part 3: "Building a Vector Search API"! ?
Feel free to drop a comment below! ?
The above is the detailed content of Getting Started with Vector Search (Part 2). For more information, please follow other related articles on the PHP Chinese website!

Pythonusesahybridmodelofcompilationandinterpretation:1)ThePythoninterpretercompilessourcecodeintoplatform-independentbytecode.2)ThePythonVirtualMachine(PVM)thenexecutesthisbytecode,balancingeaseofusewithperformance.

Pythonisbothinterpretedandcompiled.1)It'scompiledtobytecodeforportabilityacrossplatforms.2)Thebytecodeistheninterpreted,allowingfordynamictypingandrapiddevelopment,thoughitmaybeslowerthanfullycompiledlanguages.

Forloopsareidealwhenyouknowthenumberofiterationsinadvance,whilewhileloopsarebetterforsituationswhereyouneedtoloopuntilaconditionismet.Forloopsaremoreefficientandreadable,suitableforiteratingoversequences,whereaswhileloopsoffermorecontrolandareusefulf

Forloopsareusedwhenthenumberofiterationsisknowninadvance,whilewhileloopsareusedwhentheiterationsdependonacondition.1)Forloopsareidealforiteratingoversequenceslikelistsorarrays.2)Whileloopsaresuitableforscenarioswheretheloopcontinuesuntilaspecificcond

Pythonisnotpurelyinterpreted;itusesahybridapproachofbytecodecompilationandruntimeinterpretation.1)Pythoncompilessourcecodeintobytecode,whichisthenexecutedbythePythonVirtualMachine(PVM).2)Thisprocessallowsforrapiddevelopmentbutcanimpactperformance,req

ToconcatenatelistsinPythonwiththesameelements,use:1)the operatortokeepduplicates,2)asettoremoveduplicates,or3)listcomprehensionforcontroloverduplicates,eachmethodhasdifferentperformanceandorderimplications.

Pythonisaninterpretedlanguage,offeringeaseofuseandflexibilitybutfacingperformancelimitationsincriticalapplications.1)InterpretedlanguageslikePythonexecuteline-by-line,allowingimmediatefeedbackandrapidprototyping.2)CompiledlanguageslikeC/C transformt

Useforloopswhenthenumberofiterationsisknowninadvance,andwhileloopswheniterationsdependonacondition.1)Forloopsareidealforsequenceslikelistsorranges.2)Whileloopssuitscenarioswheretheloopcontinuesuntilaspecificconditionismet,usefulforuserinputsoralgorit


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
