Search
Get the nearest 5 neighbors to a vector
SET vectors.hnsw_ef_search = 64;
SELECT * FROM items ORDER BY embedding <-> '[3,2,1]' LIMIT 5;
Operators
Name | Description |
---|---|
<-> | squared Euclidean distance |
<#> | negative dot product |
<=> | cosine distance |
For operator formula, see overview.
Filter
For a given category, get the nearest 10 neighbors to a vector
SELECT 1 FROM items WHERE category_id = 1 ORDER BY embedding <#> '[0.5,0.5,0.5]' limit 10
Query options
Search options are specified by PostgreSQL GUC.
Set ivf
scan lists to 1 in session:
SET vectors.ivf_nprobe=1;
Set hnsw
search scope to 40 in transaction:
SET LOCAL vectors.hnsw_ef_search=40;
Set search mode to vbase
as system default:
ALTER SYSTEM SET vectors.search_mode=vbase;
Query options for ivf
:
Option | Type | Range | Default | Description |
---|---|---|---|---|
vectors.ivf_nprobe | integer | [1, 1000000] | 10 | Number of lists to scan. |
Query options for hnsw
:
Option | Type | Range | Default | Description |
---|---|---|---|---|
vectors.hnsw_ef_search | integer | [1, 65535] | 100 | Search scope of HNSW. |
Query options for general:
Option | Type | Range | Default | Description |
---|---|---|---|---|
vectors.enable_index | boolean | on | Enables or disables the query planner. | |
vectors.search_mode | enum | "basic", "vbase" | "basic" | Search mode. |
vectors.enable_prefilter | boolean | on | Enables or disables the prefilter. |
Advanced usage
Sometimes you expect the search to return the exact number of vectors equal to LIMIT
, but it can't:
SELECT COUNT(1) FROM (SELECT 1 FROM t WHERE (category_id = 1) ORDER BY val <-> '[1,1,1]' limit 10) t2;
--- returns 1, much less than 10
That is why we introduce search mode and prefilter.
Search modes
There are two search modes: basic
and vbase
.
basic
basic
is the default search mode.
In this mode, the filter is applied after vectors.hnsw_ef_search
vectors are returned. Therefore you need to increase vectors.hnsw_ef_search
until filtered vectors are sufficient.
The appropriate value depends on the input data distribution and filtering rate. Too large vectors.hnsw_ef_search
will result in wasted memory.
It's recommended in these situations:
- Search without filter and transaction
- Returning insufficient vectors is acceptable
- The
vbase
search mode fails due to out-of-memory
vbase
vbase
is the recommended search mode when any filter is enabled.
In this mode, the filter is applied after range
vectors are returned. The value of range
is automatically chosen by the vbase
algorithm. It is transparent to the user.
In most cases, vbase
mode would return enough vectors for your filter. For how it works, see the thesis VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity.
It's recommended in these situations:
- Search with filter or transaction
- Returning sufficient vectors is important
- Tired of tuning
vectors.hnsw_ef_search
inbasic
mode
You can enable vbase
by a SQL statement SET vectors.search_mode = vbase;
.
Prefilter
prefilter
is an enhancement strategy that can be set in any search mode. It is enabled by default.
If enabled, an additional filter is applied to select vectors before they are collected. This reduces expensive distance calculations, and all collected vectors will match the filter.
The acceleration ratio is positively correlated to the filtering rate. A filtering rate of 10% in will result in a 10x acceleration of distance calculations.
Details
Example: 10% filtering rate
SELECT * FROM generate_series(1, 10) WHERE generate_series <= 1;
However, prefilter can have a negative impact on precision if:
- The filter is not relevant to the vector distance
- The filtering rate is too low, e.g. 1%.
Details
Example: 1% filtering rate
SELECT * FROM generate_series(1, 100) WHERE generate_series <= 1;
If you need a high level of precision, please test your scenarios and consider turning it off:
ALTER SYSTEM SET vectors.enable_prefilter=off;