Day 4: Vector Search — When Keywords Fail You

Teaching your search engine to understand meaning, not just words

Jan 19, 2026

The Problem Nobody Talks About

You built a product search. Someone types “laptop” and gets laptops. Perfect.

Then a customer searches for “portable computer for travel” and gets... nothing.

Or they search for “something to keep my coffee hot” and your database full of “thermal mugs” and “insulated tumblers” returns zero results.

Your search works. It just does not understand.

This is the vocabulary mismatch problem. Users describe things differently from how you wrote them in your database. Traditional keyword search cannot bridge that gap.

Vector search can.

Why Vector Search Exists

Traditional search (BM25, the algorithm behind most keyword search) is brilliant at one thing: finding exact words. If the query contains “laptop” and the document contains “laptop,” you get a match.

But language is messy. People say:

“cheap flights” when you wrote “budget airfare”
“heart doctor” when your data says “cardiologist”
“that movie with the boat that sinks” when they mean Titanic

Keyword search fails all of these.

Vector search takes a different approach. Instead of matching words, it matches meaning.

Here is how, A machine learning model converts your text into numbers — hundreds or thousands of numbers arranged in a specific way. These numbers are called vector embeddings. Text with similar meaning ends up with similar numbers.

When someone searches for “wild west,” the vector for their query lands close to vectors for “cowboy movies” and “frontier adventures” in this mathematical space. Even though none of those words match exactly.

That is the magic. Similar meaning = similar vectors = relevant results.

The Building Blocks

Before we write code, let us understand what we are working with.

Vector embeddings are just arrays of numbers. A sentence like “OpenSearch is powerful” might become [0.23, -0.87, 0.45, ...] with hundreds of dimensions. The specific numbers encode the meaning.

k-NN (k-Nearest Neighbors) is the algorithm OpenSearch uses to find vectors closest to your query vector. You ask for the 5 nearest neighbors, it finds the 5 most similar items.

Distance metrics measure how “close” two vectors are:

L2 (Euclidean): Straight-line distance. Most common choice.
Cosine similarity: Measures the angle between vectors. Good when magnitude does not matter.
Inner product: Dot product of vectors. Fast, works well with normalized vectors.

Pick L2 if you are unsure. It works for most cases.

Your First Vector Search

Let us start simple. We will create an index that stores hotel locations as 2D vectors and find the nearest hotels to a point.

Why 2D? Because you can visualize it. Real embeddings use 768 or 1536 dimensions, but the concept is identical.

Step 1: Start OpenSearch

If you do not have it running from Day 2:

docker run -d -p 9200:9200 -p 9600:9600 \
  -e "discovery.type=single-node" \
  -e "DISABLE_SECURITY_PLUGIN=true" \
  opensearchproject/opensearch:latest

Step 2: Create a Vector Index

PUT /hotels-index
{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "location": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2"
      }
    }
  }
}

Three things matter here:

index.knn: true — enables vector search on this index
type: knn_vector — this field stores vectors
dimension: 2 — our vectors have 2 numbers (x, y coordinates)

Step 3: Add Some Hotels

POST /_bulk
{ "index": { "_index": "hotels-index", "_id": "1" } }
{ "name": "Beach Resort", "location": [5.2, 4.4] }
{ "index": { "_index": "hotels-index", "_id": "2" } }
{ "name": "Mountain Lodge", "location": [5.2, 3.9] }
{ "index": { "_index": "hotels-index", "_id": "3" } }
{ "name": "City Center Hotel", "location": [4.9, 3.4] }
{ "index": { "_index": "hotels-index", "_id": "4" } }
{ "name": "Airport Inn", "location": [4.2, 4.6] }
{ "index": { "_index": "hotels-index", "_id": "5" } }
{ "name": "Riverside Suites", "location": [3.3, 4.5] }

Step 4: Find Nearest Hotels

Now find the 3 hotels closest to coordinates [5, 4]:

POST /hotels-index/_search
{
  "size": 3,
  "query": {
    "knn": {
      "location": {
        "vector": [5, 4],
        "k": 3
      }
    }
  }
}

OpenSearch calculates the L2 distance from [5, 4] to each hotel and returns the three closest. That is vector search in its simplest form.

From Toy Example to Real Search

Two-dimensional coordinates are nice for learning. But real semantic search needs text converted to vectors.

You have two options:

Option 1: Bring Your Own Vectors

Generate embeddings outside OpenSearch using whatever tool you want — OpenAI, Cohere, a local model, anything. Then store those vectors directly.

Good when you already have an embedding pipeline or need full control over the model.

Option 2: Let OpenSearch Generate Them

Configure a machine learning model inside OpenSearch. When you index text, OpenSearch automatically creates the embeddings. When you search, it converts your query to an embedding too.

Good when you want simplicity and do not want to manage embedding infrastructure.

We will use Option 2 because it is more powerful and honestly more interesting.

Building Semantic Search

Here is what we are building: Index some text, search by meaning.

Step 1: Configure the ML Model

First, tell OpenSearch to allow ML models on data nodes (for local development):

PUT _cluster/settings
{
  "persistent": {
    "plugins.ml_commons.only_run_on_ml_node": "false",
    "plugins.ml_commons.native_memory_threshold": "99"
  }
}

Now register and deploy a model. We will use DistilBERT, a fast model that produces 768-dimensional embeddings:

POST /_plugins/_ml/models/_register?deploy=true
{
  "name": "huggingface/sentence-transformers/msmarco-distilbert-base-tas-b",
  "version": "1.0.3",
  "model_format": "TORCH_SCRIPT"
}

This takes a minute or two. Check the task status:

GET /_plugins/_ml/tasks/{task_id}

When it says COMPLETED, grab the model_id from the response. You will need it.

Step 2: Create an Ingest Pipeline

An ingest pipeline processes documents before indexing. Ours will automatically convert text to embeddings:

PUT /_ingest/pipeline/text-embedding-pipeline
{
  "description": "Converts text to embeddings",
  "processors": [
    {
      "text_embedding": {
        "model_id": "YOUR_MODEL_ID_HERE",
        "field_map": {
          "text": "embedding"
        }
      }
    }
  ]
}

This says: take the text field, run it through the model, store the result in embedding.

Step 3: Create the Index

PUT /semantic-search-index
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "text-embedding-pipeline"
  },
  "mappings": {
    "properties": {
      "text": { "type": "text" },
      "embedding": {
        "type": "knn_vector",
        "dimension": 768
      }
    }
  }
}

Note the dimension is 768 — matching what DistilBERT produces.

Step 4: Index Some Documents

POST /semantic-search-index/_doc/1
{ "text": "A cowboy riding through the desert at sunset" }

POST /semantic-search-index/_doc/2
{ "text": "Professional basketball players competing in a championship game" }

POST /semantic-search-index/_doc/3
{ "text": "A university women's basketball team practicing in an arena" }

POST /semantic-search-index/_doc/4
{ "text": "Wild horses running across the prairie" }

The pipeline automatically generates embeddings. You can verify:

GET /semantic-search-index/_doc/1

The response includes an embedding field with 768 numbers.

Step 5: Search by Meaning

GET /semantic-search-index/_search
{
  "_source": { "excludes": ["embedding"] },
  "query": {
    "neural": {
      "embedding": {
        "query_text": "wild west",
        "model_id": "YOUR_MODEL_ID_HERE",
        "k": 3
      }
    }
  }
}

The neural query converts “wild west” to an embedding and finds the closest documents. You should see the cowboy and horses documents ranked high, even though they do not contain “wild” or “west”.

That is semantic search. Meaning over keywords.

Hybrid Search: Best of Both Worlds

Here is a secret: pure semantic search is not always better than keyword search.

Keyword search excels when the user knows exactly what they want: “iPhone 15 Pro Max 256GB”. Semantic search might return other phones because they are “similar.”

The solution? Combine them.

Hybrid search runs both keyword and semantic search, then merges the results. You get exact matches when they exist, plus semantically similar results when they do not.

Step 1: Create a Search Pipeline

PUT /_search/pipeline/hybrid-search-pipeline
{
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": { "technique": "min_max" },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": { "weights": [0.3, 0.7] }
        }
      }
    }
  ]
}

The weights control how much each search type contributes. Here semantic gets 70%, keyword gets 30%. Tune this based on your data.

Step 2: Run a Hybrid Query

GET /semantic-search-index/_search?search_pipeline=hybrid-search-pipeline
{
  "_source": { "excludes": ["embedding"] },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text": "basketball game"
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "basketball game",
              "model_id": "YOUR_MODEL_ID_HERE",
              "k": 5
            }
          }
        }
      ]
    }
  }
}

Now you get results that match the keywords AND results that match the meaning, ranked together.

Optimizing for Scale: Byte Vectors

Vector search has a cost problem. Each vector with 768 dimensions uses 3KB of memory (768 × 4 bytes for float32). A million documents = 3GB just for vectors.

Byte quantization compresses vectors from 32-bit floats to 8-bit integers. Same 768 dimensions, but 768 bytes instead of 3KB. That is 75% savings.

The trade-off? Slight loss in precision. For most search applications, you will not notice.

To use byte vectors, configure your index:

PUT /compressed-vectors-index
{
  "settings": { "index.knn": true },
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "data_type": "byte",
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene"
        }
      }
    }
  }
}

The key is data_type: byte. Your embedding model needs to output int8 values (Cohere Embed v3 supports this directly).

Approximate vs Exact Search

One more thing you should know.

Approximate k-NN (ANN) uses algorithms like HNSW to find “probably the nearest” neighbors very fast. It might miss the absolute closest vector, but it is 100x faster and good enough for 99% of use cases.

Exact k-NN compares your query to every single vector. Guaranteed correct, but slow. Only use this for small datasets or when you need perfect accuracy.

OpenSearch uses approximate search by default. That is the right choice for most applications.

AWS vs Alibaba Cloud

AWS OpenSearch Service:

Fully managed, integrates with IAM
Built-in ML nodes for running embedding models
Supports SageMaker for custom models
Pricing based on instance hours + storage

Alibaba Cloud OpenSearch:

Similar managed offering
Strong ML integration with PAI
Competitive pricing for APAC workloads
Different API for some ML features

Both support vector search. The core concepts are identical. The operational details differ.

Today’s Mini-Project

Build a semantic product search:

Create an index with a text field and embedding field
Set up an ingest pipeline with a text embedding processor
Index 20+ products with descriptions
Implement three search types:
- Pure keyword search (match query)
- Pure semantic search (neural query)
- Hybrid search combining both
Compare the results for queries like:
- “laptop” (exact keyword)
- “something to write code on” (semantic)
- “portable computer” (hybrid)

Notice how each approach handles different query types.

Common Mistakes to Avoid

Dimension mismatch: Your index dimension must exactly match your model output. 768-dimensional model needs a 768-dimensional field. No exceptions.

Forgetting the pipeline: If embeddings are not being generated, check that your index has default_pipeline set.

Wrong space type: L2 works for most models. But if your model outputs normalized vectors, cosine similarity might be better.

Not enough memory: Vector search is memory-intensive. Monitor your cluster. Consider byte quantization early.

Try It Yourself

Interactive Guide: opensearch.9cld.com/day/04-vector-search

Play with the concepts hands-on. Toggle between cURL, Python, and JavaScript. Copy code directly to your terminal.

Full 60-Day Journey: opensearch.9cld.com

Day 4 of 60 in the OpenSearch Visual Guide series.

Missed the earlier days?

Day 1: OpenSearch Fundamentals — Why OpenSearch exists
Day 2: Running with Docker — Local development setup
Day 3: Your First Conversation — REST APIs and data operations

What’s Next (Day 5)

Tomorrow we explore analyzers and mappings — the foundation of great text search. Understanding these makes both keyword and semantic search work better.

Building in public. Learning out loud. Follow along as I go from zero to OpenSearch ambassador in 60 days.

9Cloud

Ready for more?