9Cloud

Day 11: How OpenSearch Nodes Find Each Other

Ankit Mehta — Tue, 24 Feb 2026 00:25:55 GMT

You have been running OpenSearch for ten days. You created indexes. You ran queries. You built chatbots and migrated data.

But have you ever wondered what happens in the first few seconds after you start a cluster?

Before any index exists. Before any query runs. Before your data is safe. There is a process happening that most people never think about until it breaks.

That process is discovery and cluster formation.

Get it right and your cluster starts reliably every time, recovers from failures automatically, and scales without drama. Get it wrong and you wake up to a split-brain scenario where two nodes both think they are in charge, writing conflicting data to the same indexes.

This is not theoretical. This is the kind of failure that corrupts production data.

Why This Matters Before Anything Else

Every OpenSearch cluster needs a leader. This leader is called the cluster manager (historically called the “master” node in Elasticsearch).

The cluster manager is responsible for:

Maintaining the definitive cluster state, which includes node membership, index metadata, and shard allocation
Publishing state updates to all nodes
Coordinating shard allocation and rebalancing
Managing cluster-wide settings

Without a cluster manager, nothing works. No index creation. No shard movement. No cluster state updates. Your cluster is frozen.

So the very first thing OpenSearch must do when nodes start is figure out who else is out there and agree on a leader. That is discovery.

Discovery: How Nodes Find Each Other

When an OpenSearch node starts up, it has no idea what the rest of the cluster looks like. It does not know how many other nodes exist. It does not know who the current cluster manager is. It does not even know if a cluster already exists.

The node needs a starting point. That starting point is called seed hosts.

Seed hosts are a list of addresses where the node should look for other cluster-manager-eligible nodes. Think of it as a phone book. The node calls each address and asks: “Is there a cluster here? Who is in charge?”

You configure seed hosts in opensearch.yml:

discovery.seed_hosts:
  - 10.0.1.10:9300
  - 10.0.1.11:9300
  - 10.0.1.12:9300

A few things to understand about this list:

These do not need to be all the nodes in the cluster. They just need to be enough to find at least one active cluster-manager-eligible node.
If you omit the port, OpenSearch will scan the default range of 9300 to 9400.
You can use hostnames instead of IPs. OpenSearch will resolve them via DNS.
The seed hosts list is only used during discovery. Once a node joins the cluster, it gets the full picture from the cluster state.

Seed Host Providers

OpenSearch supports three ways to provide seed host information.

Settings-based (default). The list in discovery.seed_hosts in your opensearch.yml file. Simple, static, works for most deployments.

File-based. A file called unicast_hosts.txt in the OpenSearch config directory. Useful when your infrastructure updates the file dynamically, like a configuration management tool writing new IPs as nodes spin up.

10.0.1.10:9300
10.0.1.11:9300
10.0.1.12

Plugin-based. Cloud providers offer plugins that discover nodes through APIs. On AWS, the EC2 discovery plugin can find nodes by tags. On Alibaba Cloud, similar mechanisms exist for ECS instances.

discovery.seed_providers: ec2
discovery.ec2.tag.role: master
discovery.ec2.tag.environment: production

You can combine providers. OpenSearch will merge the results from all configured sources.

Bootstrapping: The First Time is Special

There is a critical difference between starting a brand-new cluster and restarting an existing one.

When nodes have never been part of a cluster before, there is no cluster state. No leader. No history. OpenSearch needs to be told which nodes should participate in the very first election. This process is called bootstrapping.

You configure it with:

cluster.initial_cluster_manager_nodes:
  - cluster-manager-1
  - cluster-manager-2
  - cluster-manager-3

These values must exactly match the node.name setting on each node. Not the hostname. Not the IP. The node name. This is case-sensitive and character-exact.

If the name is cluster-manager-1, you cannot write cluster-manager-1.example.com in the bootstrap list. The log will tell you:

cluster manager not discovered yet, this node has not previously
joined a bootstrapped cluster, and this node must discover
cluster-manager-eligible nodes [cluster-manager-1, cluster-manager-2]
to bootstrap a cluster

This is the single most common error people hit when setting up a new cluster.

When Bootstrapping is Required

Bootstrapping is only needed once, when the cluster forms for the very first time. After that initial formation, OpenSearch stores the cluster state and never needs the bootstrap configuration again.

Specifically:

Required: Starting a brand-new cluster where no node has ever joined a cluster before.
Not required: Nodes joining an existing cluster. They get configuration from the current cluster manager.
Not required: Cluster restarts, including full cluster restarts. The existing cluster state is preserved on disk and used for recovery.

In development mode, if you do not configure any discovery settings, OpenSearch will automatically bootstrap a single-node cluster. This is convenient for local testing but dangerous in production because auto-bootstrapping can lead to split-brain if multiple nodes start independently.

# Development only - single node mode
discovery.type: single-node

Never use single-node in production. It suppresses safety mechanisms that exist specifically to prevent data corruption.

Voting and Quorum: How Leaders Are Elected

Once nodes have discovered each other, they need to agree on a leader. OpenSearch uses a quorum-based voting mechanism for this.

What is a quorum? It is the minimum number of cluster-manager-eligible nodes that must agree before an election can succeed. The formula is simple:

Quorum = (number of voting nodes / 2) + 1

With three cluster-manager-eligible nodes, the quorum is two. With five, the quorum is three. This majority requirement is what prevents split-brain.

Why does the majority rule matter? Imagine a network partition splits your three cluster-manager-eligible nodes into two groups: one group with two nodes and one group with one node.

The group with two nodes has a quorum and can elect a leader. The group with one node cannot. This means exactly one side of the partition can operate, preventing conflicting writes.

The Voting Configuration

The voting configuration is the set of cluster-manager-eligible nodes that participate in elections. OpenSearch manages this automatically as nodes join and leave the cluster.

When a new cluster-manager-eligible node joins, it gets added to the voting configuration. When a node leaves, OpenSearch can automatically shrink the voting configuration to remove it, as long as the configuration still contains at least three nodes.

This automatic shrinking is controlled by:

cluster.auto_shrink_voting_configuration: true  # default

Why the minimum of three? Because with only two voting nodes, losing one node means losing quorum. The cluster would be stuck. Three is the minimum number that can tolerate a single node failure.

If you need to explicitly remove a node from the voting configuration (for example, during a planned decommission), you use the voting configuration exclusions API:

POST /_cluster/voting_config_exclusions?node_names=node-to-remove

This tells OpenSearch to remove the specified node from the voting configuration. The cluster will recalculate quorum based on the remaining nodes.

You can control how many exclusions are tracked with:

cluster.max_voting_config_exclusions: 10  # default

What Happens During a Leader Failure

When the elected cluster manager goes down, the remaining cluster-manager-eligible nodes detect the failure through a fault detection mechanism. This triggers a new election.

The process follows a predictable sequence:

Remaining nodes notice the leader has stopped responding.
Each node waits a randomized delay before starting an election. This prevents all nodes from trying to become leader simultaneously.
A node requests votes from all other voting nodes.
If the node receives votes from a quorum, it becomes the new leader.
The new leader publishes an updated cluster state to all nodes.

The randomized delay is important. Without it, two nodes might start elections at the same time, split the votes, and force a retry. The randomization is controlled by:

cluster.election.initial_timeout: 100ms    # base delay
cluster.election.back_off_time: 100ms      # linear backoff per failure
cluster.election.max_timeout: 10s          # maximum delay cap

In most healthy clusters, leader failover happens in seconds.

Fault Detection Settings

OpenSearch continuously monitors the health of nodes in the cluster. The cluster manager pings followers. Followers ping the cluster manager. These are called fault detection pings.

Key settings that control this behavior:

# How often the leader checks each follower
cluster.fault_detection.leader_check.interval: 1s
cluster.fault_detection.leader_check.timeout: 10s
cluster.fault_detection.leader_check.retry_count: 3

# How often followers check the leader
cluster.fault_detection.follower_check.interval: 1s
cluster.fault_detection.follower_check.timeout: 10s
cluster.fault_detection.follower_check.retry_count: 3

With default settings, a failed node is detected within about 30 seconds (10 second timeout multiplied by 3 retries). You can tighten these values for faster detection, but be careful. On loaded clusters, aggressive timeouts can cause false positives where healthy nodes are removed because they were too busy to respond in time.

Cluster State Publishing

When the cluster manager makes a change (new index, shard movement, settings update), it needs to broadcast the new cluster state to all nodes. This happens through a two-phase commit process.

First, the cluster manager sends the update to all nodes and waits for acknowledgment. Then, once enough nodes have acknowledged, it sends a commit message. This ensures that either all nodes get the update or none do.

The timeout for this process is:

cluster.publish.timeout: 30s

If the cluster manager cannot publish within this timeout, it steps down and a new election begins. This prevents a situation where a network-isolated cluster manager keeps making decisions that no other node knows about.

Cluster Manager Task Throttling

Every change to the cluster state goes through the cluster manager. Creating an index, updating a mapping, starting a shard. All of these generate tasks that enter the cluster manager’s task queue.

This queue is unbounded by default. That is a problem.

If your application suddenly sends thousands of create-index requests, or a bug in your ingestion pipeline floods the cluster manager with put-mapping tasks, the queue grows without limit.

The cluster manager becomes overloaded trying to process all these tasks. Its performance degrades. Fault detection pings time out because the cluster manager is too busy to respond.

Other nodes think the leader has failed. A new election starts. But the new leader inherits the same flood of tasks.

This is how a task flood can take down an entire cluster.

OpenSearch introduced cluster manager task throttling to prevent this. It works by setting limits on how many pending tasks of each type can sit in the queue. When the limit is reached, new tasks of that type are rejected.

The rejected node retries with exponential backoff. If retries fail within the timeout period, OpenSearch returns a cluster timeout error.

The key insight is that throttling works per task type. Rejecting put-mapping tasks does not block create-index tasks. This means one misbehaving workload cannot starve other critical operations.

Configuring Task Throttling

You enable throttling through cluster settings:

PUT _cluster/settings
{
  "persistent": {
    "cluster_manager.throttling.thresholds": {
      "put-mapping": {
        "value": 100
      },
      "create-index": {
        "value": 25
      }
    }
  }
}

You can also configure retry behavior:

PUT _cluster/settings
{
  "persistent": {
    "cluster_manager.throttling": {
      "retry": {
        "base.delay": "1s",
        "max.delay": "25s"
      },
      "thresholds": {
        "put-mapping": {
          "value": 100
        }
      }
    }
  }
}

To disable throttling for a specific task type, set its value to -1.

Supported task types include:

create-index
update-settings
cluster-update-settings
auto-create
delete-index
delete-dangling-index
create-data-stream
remove-data-stream
rollover-index
index-aliases
put-mapping
create-index-template
remove-index-template

You can monitor throttling stats with:

GET _cluster/stats

Look for the cluster_manager_throttling section in the response, which shows total throttled tasks and a breakdown by task type.

AWS vs Alibaba Cloud: What Changes

When you use a managed service, most of the discovery and cluster formation work is handled for you. But understanding the differences matters when things go wrong.

AWS OpenSearch Service. Discovery and bootstrapping are fully managed. You do not configure seed hosts or bootstrap nodes. AWS handles cluster manager election, fault detection, and node replacement automatically.

If a cluster manager node fails, AWS replaces it and the new node joins the cluster without manual intervention.

Cluster manager task throttling is available starting from engine version 1.3, and AWS has fine-tuned the thresholds per cluster size.

Alibaba Cloud OpenSearch. Also handles discovery automatically in its managed offering. Node replacement and leader election are managed by the platform.

The specifics of throttling configuration may differ from the open-source defaults, so check the Alibaba Cloud documentation for your engine version.

Self-managed on either cloud. If you run OpenSearch on EC2 or ECS instances yourself, you handle all of this configuration.

Use the EC2 discovery plugin on AWS or equivalent discovery mechanisms on Alibaba Cloud to automate seed host discovery instead of hardcoding IPs that change with every instance replacement.

The critical difference: on managed services, you trade configuration control for operational simplicity. You cannot modify discovery settings, fault detection timeouts, or voting configuration. If you need that level of control, self-managed is your path.

A Complete Production Configuration

Here is a complete opensearch.yml configuration for a production cluster with three dedicated cluster manager nodes:

# Cluster identification
cluster.name: production-cluster

# Node identity (different on each node)
node.name: cluster-manager-1
node.roles: [cluster_manager]

# Network
network.host: 0.0.0.0
transport.port: 9300

# Discovery
discovery.seed_hosts:
  - cluster-manager-1.example.com:9300
  - cluster-manager-2.example.com:9300
  - cluster-manager-3.example.com:9300

# Bootstrap (ONLY for initial cluster formation, remove after)
cluster.initial_cluster_manager_nodes:
  - cluster-manager-1
  - cluster-manager-2
  - cluster-manager-3

# Voting configuration
cluster.auto_shrink_voting_configuration: true
cluster.max_voting_config_exclusions: 3

# Cluster state publishing
cluster.publish.timeout: 60s
cluster.join.timeout: 120s

# Fault detection
cluster.fault_detection.leader_check.interval: 1s
cluster.fault_detection.leader_check.timeout: 10s
cluster.fault_detection.leader_check.retry_count: 3
cluster.fault_detection.follower_check.interval: 1s
cluster.fault_detection.follower_check.timeout: 10s
cluster.fault_detection.follower_check.retry_count: 3

After the cluster forms successfully for the first time, remove the cluster.initial_cluster_manager_nodes line. Leaving it in place is not dangerous on an already-formed cluster, but removing it makes intent clear and prevents confusion during future troubleshooting.

Common Mistakes

Mismatched node names in bootstrap. The most frequent error. The value in cluster.initial_cluster_manager_nodes must exactly match node.name on each node. Not the hostname. Not the FQDN. The node name.
Leaving bootstrap config on existing clusters. It works, but it creates confusion. If someone later reads the config, they might think the cluster has not been bootstrapped yet.
Too few cluster-manager-eligible nodes. One cluster manager means zero fault tolerance. Two cluster managers means you cannot survive even one failure because you lose quorum. Three is the practical minimum. Five gives you tolerance for two simultaneous failures but adds election overhead.
Running cluster manager and data roles on the same node. In production, a heavy indexing or search workload can starve the cluster manager of CPU and memory. Dedicated cluster manager nodes are small (2 vCPU, 4 GB RAM is often enough) and their only job is managing the cluster state.
Aggressive fault detection timeouts. Setting the timeout to 1 second sounds like faster recovery. In practice, it means healthy nodes get removed during garbage collection pauses or temporary network blips. The defaults exist for a reason.
Ignoring task throttling. If your application can programmatically create indexes or update mappings, one bug can generate thousands of tasks in seconds. Set throttling thresholds before you need them, not after the cluster is already overwhelmed.

What is Next

On Day 12, we go deep into cluster performance tuning. You will learn how indexing speed, refresh intervals, merge policies, and shard allocation interact to determine how fast your cluster can ingest and search data. This is where theoretical knowledge becomes measurable performance gains.

Interactive guide: opensearch.9cld.com/day/11-discovery-and-cluster-formation

All guides: opensearch.9cld.com

Day 10: Migrate and Upgrade OpenSearch

Ankit Mehta — Fri, 13 Feb 2026 06:59:11 GMT

You are running OpenSearch 1.3 in production. It works. Queries come back. Logs get ingested.

Then someone asks: “Should we not upgrade to the latest version?”

Suddenly, you are staring at documentation pages about rolling upgrades, snapshot compatibility, Lucene index versions, and something called Migration Assistant. Every page assumes you already know which method to pick.

The hardest part of migration is not the execution. It is choosing the right strategy before you start.

Pick wrong and you are looking at data loss, unexpected downtime, or a rollback that takes longer than the migration itself.

The One Rule That Changes Everything

OpenSearch nodes cannot be downgraded.

Once you upgrade a node, there is no rolling back to the previous version. Your only recovery path is restoring from a snapshot you took before the upgrade.

Did you take that snapshot? If not, your options shrink dramatically.

This is why every migration starts the same way. Snapshot first. Decide second. Execute third.

Why Upgrades Matter More Than You Think

OpenSearch follows semantic versioning. Major versions introduce breaking changes. Minor versions add features. Patch versions fix bugs.

That sounds clean in theory. In practice, it means:

Skipping versions is not always possible
Plugin compatibility can break silently between minor versions
Lucene index format changes can make your existing data unreadable without reindexing

Staying on an old version does not just mean missing features. It means accumulating security vulnerabilities, losing community support, and eventually hitting a wall where the upgrade path becomes exponentially harder because you waited too long.

The Four Migration Strategies

OpenSearch supports four methods. Each makes different tradeoffs between downtime, complexity, infrastructure cost, and version compatibility.

Understanding these tradeoffs is the entire game.

Rolling Upgrade

You upgrade one node at a time while keeping the cluster operational. Data continues to be ingested. Queries keep running. From the outside, the cluster appears normal.

The catch? Rolling upgrades only work between minor versions within the same major release, or from the last minor of one major to the first of the next.

You cannot jump from OpenSearch 1.0 to 2.12 in a single rolling upgrade. You need intermediate hops.

The process follows a precise sequence:

Verify cluster health is green and all shards are allocated
Disable shard allocation so OpenSearch does not try to rebalance while nodes are offline
Flush the translog to commit recent operations to Lucene segments
Stop one node, upgrade it, start it again
Re-enable allocation and wait for green status
Repeat for each node — upgrade the active cluster manager node last

Why last? Because an OpenSearch node cannot join a cluster if the cluster manager is running a newer version. If you accidentally upgrade the manager first, the remaining nodes on the old version cannot rejoin.

Snapshot and Restore

This is the safest option for major version jumps or infrastructure changes.

You take a snapshot of your source cluster, stand up a fresh target cluster on the new version, and restore the snapshot. Complete isolation between source and target. If anything goes wrong, your source cluster is untouched.

Snapshots are forward-compatible by one major version. A snapshot taken on OpenSearch 1.x can be restored on 2.x. But a snapshot from 1.x cannot be restored directly on a hypothetical 3.x. For larger gaps, you need intermediate steps.

Snapshots are also incremental. After the first full snapshot, each subsequent one only captures what changed. This makes the process practical even for large clusters.

When restoring for migration, you have useful options:

Choose specific indexes instead of restoring everything
Exclude system indexes like .opendistro_security that might conflict
Rename indexes during restore to avoid naming collisions
Adjust replica counts and shard allocation to match target capacity

Remote Reindexing

This pulls data directly from a source cluster into a target cluster using the Reindex API. No snapshots. No intermediate storage.

The source cluster serves as a live data source while the target indexes fresh. This supports large version jumps because the data is reindexed from scratch. Lucene format compatibility is not a concern.

The downside? Performance impact. The source cluster serves reindex requests on top of its normal workload. It is also slower than snapshot restore for very large datasets.

But if you need to restructure mappings, change analyzers, or modify shard counts during migration, reindexing is the only method that lets you transform data in flight.

Migration Assistant

The most comprehensive option. An end-to-end solution that handles metadata migration, historical data backfill, and live traffic capture and replay.

How does it work?

A Capture Proxy sits in front of your source cluster and records all incoming HTTP traffic to Kafka
A Reindex-from-Snapshot process backfills historical data into the target cluster
Once backfill completes, a Traffic Replayer reads from Kafka and replays captured requests against the target

The result is near-zero downtime migration with the ability to compare source and target behavior before cutting over.

Migration Assistant is the right choice when you need zero-downtime migration from Elasticsearch to OpenSearch, when you want to validate behavior under real workload, or when you are making a major version jump that other methods do not support cleanly.

Which Strategy Should You Pick?

Ask yourself three questions.

Can you tolerate downtime?

If not, use Rolling Upgrade for minor versions or Migration Assistant for major jumps.
If brief downtime is acceptable, Snapshot and Restore gives you the cleanest path.

How large is the version gap?

Same major version — Rolling Upgrade.
Adjacent major versions — Snapshot and Restore.
Multiple majors or ES 7.11+ — Migration Assistant or a Logstash bridge.

Do you need to transform data?

If no, snapshots preserve everything as-is.
If you need new mappings or shard counts, Remote Reindex lets you restructure in flight.

The Pre-Migration Checklist

Regardless of which strategy you choose, the preparation steps are the same. Skip any of these and you are gambling.

Review breaking changes between current and target version. These are not suggestions. They are things that will break.
Check plugin compatibility. Plugin major, minor, AND patch versions must match OpenSearch exactly. Plugin 2.3.0.x works only with OpenSearch 2.3.0.
Review tools compatibility. Logstash, Beats, Data Prepper — they may also need upgrading. An OpenSearch upgrade that breaks your ingestion pipeline is worse than no upgrade.
Back up configuration files. opensearch.yml, plugin configs, TLS certificates. These do not migrate automatically.
Take a snapshot. Non-negotiable. Store in external storage like S3, OSS, or NFS. This is your rollback plan.
Stop nonessential indexing. Reduces in-flight data and simplifies recovery if something goes wrong.

Version Compatibility

OpenSearch nodes within the same major version are compatible regardless of minor version. OpenSearch 1.1.0 can coexist with 1.3.7 in the same cluster.

Nodes and indexes are backward-compatible with the previous major version. An OpenSearch 2.x cluster can read indexes created on 1.x. But indexes from Elasticsearch 6.x or earlier must be reindexed or deleted before upgrading to OpenSearch 2.x.

What really determines index compatibility? The underlying Lucene version. Each OpenSearch version ships with a specific Lucene version. When Lucene makes a breaking change to its index format, older indexes must be reindexed.

For snapshots, the rule is simple. Forward-compatible by one major version only:

ES 7.x snapshot → restores on OpenSearch 1.x ✓
OpenSearch 1.x snapshot → restores on OpenSearch 2.x ✓
ES 6.x snapshot → restores on OpenSearch 2.x ✗ (needs intermediate 1.x step)

Migrating from Elasticsearch

This is one of the most common scenarios. The path depends on which Elasticsearch version you are running.

ES OSS 7.10.2 or earlier — Smoothest path. Rolling upgrade or cluster restart directly to OpenSearch 1.x. The opensearch-upgrade tool automates config import, keystore migration, and core plugin installation.
ES OSS 6.x — Upgrade to 6.8 first, then to 7.10.2, then migrate to OpenSearch 1.x. No shortcuts.
ES 7.11+ (post-fork) — Direct migration is not supported. The codebase diverged. Your options are Logstash as a bridge (use version 7.13.4 or earlier) or Migration Assistant.
Open Distro (ODFE) — Upgrade to ODFE 1.13, then migrate to OpenSearch 1.x. Cleanest path with the fewest surprises.

AWS OpenSearch Service

AWS offers in-place upgrades with automated pre-upgrade validation. You initiate from the console, CLI, or API. AWS runs checks first and only proceeds if everything passes.

AWS takes automated hourly snapshots and retains up to 336 of them for 14 days. But here is the detail that catches teams: automated snapshots can only restore within the same domain.

For cross-domain migration, you need manual snapshots stored in your own S3 bucket. You register an S3 repository with your domain first, then take manual snapshots.

Watch out for two things. If the target domain has Multi-AZ with Standby enabled, the restore operation will fail — disable it first. And exclude .opendistro_security indexes from restores to avoid overwriting the target domain security configuration.

Alibaba Cloud

Alibaba Cloud uses standard snapshot and restore for migration. The cross-cloud workflow adds steps that AWS-only teams might not expect.

Migrating from AWS to Alibaba Cloud follows this path:

Create snapshot repository on AWS backed by S3
Take full snapshot
Create OSS bucket on Alibaba Cloud and register it as a snapshot repository
Use ossimport or Data Online Migration service to transfer snapshot files from S3 to OSS
Restore on the Alibaba Cloud cluster

For incremental migration, repeat the snapshot-transfer-restore cycle. The final cutover involves stopping writes, taking a final snapshot, transferring, restoring, then switching traffic.

Alibaba Cloud also provides Data Transmission Service (DTS) for real-time synchronization between clusters. Useful when you need to run source and target in parallel during migration with minimal data lag.

Mini Project: Snapshot Migration Lab

Practice the complete workflow locally with two OpenSearch clusters running different versions.

Docker Compose — Two Clusters:

version: '3'
services:
  opensearch-source:
    image: opensearchproject/opensearch:2.11.0
    container_name: os-source
    environment:
      - discovery.type=single-node
      - DISABLE_SECURITY_PLUGIN=true
      - path.repo=/snapshots
    volumes:
      - source-data:/usr/share/opensearch/data
      - snapshot-repo:/snapshots
    ports:
      - "9200:9200"

  opensearch-target:
    image: opensearchproject/opensearch:2.17.0
    container_name: os-target
    environment:
      - discovery.type=single-node
      - DISABLE_SECURITY_PLUGIN=true
      - path.repo=/snapshots
    volumes:
      - target-data:/usr/share/opensearch/data
      - snapshot-repo:/snapshots
    ports:
      - "9201:9200"

volumes:
  source-data:
  target-data:
  snapshot-repo:

Index sample data on the source cluster:

curl -X PUT "localhost:9200/products" -H "Content-Type: application/json" -d '{
  "settings": { "number_of_shards": 1, "number_of_replicas": 0 },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "category": { "type": "keyword" },
      "price": { "type": "float" }
    }
  }
}'

curl -X POST "localhost:9200/products/_bulk" -H "Content-Type: application/json" -d '
{"index":{"_id":"1"}}
{"name":"Wireless Headphones","category":"electronics","price":79.99}
{"index":{"_id":"2"}}
{"name":"Running Shoes","category":"footwear","price":129.99}
{"index":{"_id":"3"}}
{"name":"Coffee Grinder","category":"kitchen","price":49.99}
'

Register a snapshot repository on the source:

curl -X PUT "localhost:9200/_snapshot/migration_repo" -H "Content-Type: application/json" -d '{
  "type": "fs",
  "settings": { "location": "/snapshots/migration" }
}'

Take a snapshot:

curl -X PUT "localhost:9200/_snapshot/migration_repo/snapshot_1?wait_for_completion=true" \
  -H "Content-Type: application/json" -d '{
  "indices": "products",
  "ignore_unavailable": true,
  "include_global_state": false
}'

Register the same repository on the target:

curl -X PUT "localhost:9201/_snapshot/migration_repo" -H "Content-Type: application/json" -d '{
  "type": "fs",
  "settings": { "location": "/snapshots/migration" }
}'

Verify the snapshot is visible from the target:

curl -X GET "localhost:9201/_snapshot/migration_repo/_all?pretty"

Restore on the target cluster:

curl -X POST "localhost:9201/_snapshot/migration_repo/snapshot_1/_restore" \
  -H "Content-Type: application/json" -d '{
  "indices": "products",
  "ignore_unavailable": true,
  "include_global_state": false
}'

Verify the migration:

# Check index exists
curl -X GET "localhost:9201/_cat/indices?v"

# Verify document count
curl -X GET "localhost:9201/products/_count"

# Confirm data integrity
curl -X GET "localhost:9201/products/_search?pretty" -H "Content-Type: application/json" -d '{
  "query": { "match_all": {} }
}'

All three documents on the target with matching content? Migration succeeded.

In production, you would add the steps of stopping writes, taking a final incremental snapshot, and switching DNS or load balancer targets.

Common Mistakes

Not snapshotting before upgrading. Nodes cannot be downgraded. Without a snapshot, your only recovery is rebuilding from source data.
Upgrading the cluster manager first. Other nodes running older versions cannot rejoin. Always upgrade data nodes first, manager last.
Restoring .opendistro_security indexes. This overwrites the target cluster security setup. Configure security separately.
Ignoring plugin versions. A plugin for 2.3.0 will not load on 2.4.0. Check before upgrading, not after the cluster fails to start.
Jumping multiple major versions. ES 6.x directly to OpenSearch 2.x is not supported. Follow the intermediate steps.

What is Next

On Day 11, we move into Query DSL mastery. You have been writing basic match queries and term filters. Now it is time to learn the full power of OpenSearch queries, bool compounds, function score, nested queries, scripted fields. This is where OpenSearch stops being a search box and becomes a precision instrument.

Interactive guide: https://opensearch.9cld.com/day/10-migrate-upgrade

All interactive guides: https://opensearch.9cld.com

Day 9: Building Chatbots with OpenSearch

Ankit Mehta — Wed, 04 Feb 2026 06:21:18 GMT

The Problem With Stateless Search

You built a RAG pipeline. It works. A user asks a question, OpenSearch retrieves relevant documents, and the LLM generates a grounded answer. Beautiful.

Then the user asks a follow-up.

“What is the population of Seattle?” Great answer. “How does that compare to Austin?” The system has no idea what “that” refers to. It forgot Seattle the moment it answered.

This is the wall every search application hits. Real users do not ask one question and leave. They have conversations. Each question builds on the last. “Which one is growing faster?” only makes sense if the system remembers we were talking about Seattle and Austin.

A stateless RAG pipeline cannot handle this. It treats every request like the first one.

What Makes a Chatbot Different from RAG?

Two things. Conversation memory and intelligent tool routing.

On Day 7, we built RAG with OpenSearch. That gave us grounded answers from a single knowledge base. Today, we add memory so the chatbot remembers what was said before. And we add routing so it can pick the right knowledge base automatically when you have more than one.

The result is not just a search with a chat interface. It is a system that reasons about which data source to query, remembers what it already told you, and builds on previous answers.

How the Architecture Works

What happens when a user sends a message to an OpenSearch chatbot?

The question arrives through the Execute Agent API. If the user includes a memory_id, the agent knows this is a continuation of a previous conversation. It loads the chat history from its memory index.

Then it decides which tools to run. For a RAG chatbot, that usually means a VectorDBTool performing a semantic search against a knowledge base. But if you have multiple knowledge bases, the agent reads each tool’s description and picks the one that matches the question.

The retrieved documents and conversation history get assembled into a prompt. That prompt goes to the LLM. The LLM generates a response grounded in real data, not its training memory. And finally, the question and answer get stored back into the conversation index so future questions have context.

All of this happens inside OpenSearch. No external orchestration service. No separate vector database. No custom middleware. One API call through the ML Commons plugin.

Three Agent Types, Three Decision Styles

OpenSearch gives you three agent types. Choosing the right one determines how your chatbot thinks.

The conversational_flow agent runs tools in a fixed sequence. Think of it like a recipe. First search the knowledge base, then send results to the LLM. Always that order.

The output of one tool flows into the next through variable chaining. Predictable. Easy to debug. Perfect when you have one knowledge base and a straightforward search-then-answer pattern.

The conversational agent lets the LLM decide which tool to call. This is the dynamic option.

You give it two knowledge bases, one for population data and one for tech news, and it reads the tool descriptions to figure out which one matches the question.

Ask “What is Vision Pro?” and it picks the tech news tool. Ask “Population of Seattle?” and it picks the population tool. This is the one you want for multi-domain chatbots.

The flow agent is the stateless version. Runs tools sequentially but does not store conversation history. Good for one-shot queries where memory is unnecessary.

Which should you pick? If you have a single knowledge base, start with conversational_flow. If you have multiple data sources, use conversational. If you do not need follow-up questions, flow is enough.

Building a Multi-Knowledge-Base Chatbot

Let us build the real thing. Two knowledge bases, one agent, dynamic routing.

The first knowledge base contains US city population data. We set that up in earlier days using a vector index with an ingest pipeline that generates embeddings automatically.

The second contains recent tech news about products like Apple Vision Pro, Meta’s LLaMA models, and Amazon Bedrock.

Setting up the tech news knowledge base follows the same pattern from previous days.

Create an ingest pipeline that maps the passage field to an embedding field.
Create a knn index with cosine similarity.
Bulk ingest the articles.

The pipeline handles vector embeddings at ingest time.

Now the critical part. Registering the conversational agent:

POST _plugins/_ml/agents/_register
{
  "name": "Chat Agent with RAG",
  "type": "conversational",
  "llm": {
    "model_id": "your_llm_model_id",
    "parameters": {
      "max_iteration": 5,
      "response_filter": "$.completion"
    }
  },
  "memory": { "type": "conversation_index" },
  "tools": [
    {
      "type": "VectorDBTool",
      "name": "population_data_knowledge_base",
      "description": "This tool provides population data of US cities.",
      "parameters": {
        "input": "${parameters.question}",
        "index": "test_population_data",
        "source_field": ["population_description"],
        "model_id": "your_text_embedding_model_id",
        "embedding_field": "population_description_embedding",
        "doc_size": 3
      }
    },
    {
      "type": "VectorDBTool",
      "name": "tech_news_knowledge_base",
      "description": "This tool provides recent tech news.",
      "parameters": {
        "input": "${parameters.question}",
        "index": "test_tech_news",
        "source_field": ["passage"],
        "model_id": "your_text_embedding_model_id",
        "embedding_field": "passage_embedding",
        "doc_size": 2
      }
    }
  ],
  "app_type": "chat_with_rag"
}

See the description field in each tool? That is the secret. The LLM reads “provides population data of US cities” and “provides recent tech news” and decides which tool fits the question. Clear, specific descriptions are the difference between a chatbot that routes correctly and one that searches the wrong knowledge base every time.

Conversation Memory Changes Everything

Ask about Seattle’s population. Get an answer. Now ask “How does Austin compare?” with the same memory_id.

The agent searches the population knowledge base for Austin’s data. But it also has the previous Seattle answer in its conversation history. The LLM generates a comparison without needing to re-query Seattle. That is how real conversations work.

How does the memory system organize this? Two levels.

A memory groups the entire conversation, identified by memory_id.
Within that memory, each question-answer pair is a message, identified by parent_message_id.

You can inspect any conversation through the Memory APIs. Want to see what the agent did behind the scenes? Retrieve execution traces that show which tools ran, what results they returned, and how the LLM used them. This is invaluable for debugging.

When You Want Full Control: Output Chaining

The conversational agent is smart. But sometimes you do not want the LLM deciding things. You want a fixed pipeline.

That is what the conversational_flow agent gives you. Define the exact tool sequence during registration. The agent follows it every time.

The key mechanism is output chaining.

When you write ${parameters.population_knowledge_base.output:-} in the MLModelTool prompt, you inject the VectorDBTool output directly into the LLM context.

The :- suffix is a safe default. If the tool produces no output, it passes an empty string instead of breaking the prompt.

You can also skip tools at runtime.

If a user asks “Translate last answer into Chinese”, you do not need to search the knowledge base again.

Pass "selected_tools": ["bedrock_claude_model"] and only the LLM runs.

Beyond VectorDBTool: The Full Toolkit

The build-your-own-chatbot tutorial introduces tools that go far beyond vector search.

ListIndexTool returns metadata about all indexes in your cluster.
SearchIndexTool lets the agent run arbitrary OpenSearch queries, not just semantic searches.
CatIndexTool provides index statistics.
PPLTool converts natural language into Piped Processing Language queries and executes them.

PPLTool is particularly interesting. A user asks “How many orders do I have in last week?” and the agent translates that into a PPL query, runs it against your eCommerce index, and has the LLM interpret the results in natural language. You just turned OpenSearch into a conversational analytics platform.

For production chatbots, consider combining multiple VectorDBTools in a single conversational_flow agent.

A product recommendation bot might have one tool for product descriptions and another for product reviews.
The MLModelTool prompt references both outputs, giving the LLM comprehensive context to generate well-rounded recommendations.

Cloud Platform Differences

On AWS OpenSearch Service, the full ML Commons agent framework is available with native Bedrock connectors. You authenticate using Sigv4 signing, and the connector handles credential management. Your chatbot’s LLM backend runs on Bedrock while retrieval and orchestration happen on OpenSearch Service.

Alibaba Cloud’s managed OpenSearch service provides its own intelligent search capabilities that differ from the open-source agent framework. Model Studio offers Qwen models as the LLM backend. If you need the exact ML Commons agent APIs, run a self-managed OpenSearch cluster on Alibaba Cloud ECS instances. Full control over the agent framework, Alibaba Cloud infrastructure for compute and networking.

The connector configuration differs between platforms.

AWS uses aws_sigv4 protocol with access keys and session tokens.
Alibaba Cloud uses AccessKey-based authentication with RAM permissions.

Same underlying agent types, different endpoint URLs and request body formats.

What is Next

Tomorrow, we continue building on these chatbot patterns. The agent framework you learned today is the foundation for more advanced workflows involving guardrails, agentic memory, and production deployment considerations.

The interactive guide: opensearch.9cld.com/day/09-chatbots
All previous guides are at opensearch.9cld.com

Day 8: Agentic AI

Ankit Mehta — Fri, 30 Jan 2026 05:30:24 GMT

You have built a RAG system. It retrieves documents, sends them to an LLM, and generates answers. But what happens when the question requires more than one step? What if your system needs to first understand the schema, then decide which index to query, then execute the search, then analyze the results, and finally synthesize an answer?

Traditional RAG falls apart. You end up writing custom orchestration logic for every possible scenario. And as complexity grows, so does your codebase.

OpenSearch 2.13 introduced something different. Agents. These are not just pipelines that run tools in sequence. They are coordinators that use LLMs to reason about problems, decide what actions to take, and adapt their approach based on intermediate results.

Let’s explore how OpenSearch turns your search cluster into an intelligent system that can plan, execute, and reflect.

What Is an Agent?

An agent is a coordinator that uses a large language model to solve problems. But coordination is the keyword here. The LLM does the thinking. The agent handles everything else.

When you ask an agent a question, the following sequence unfolds:

The LLM receives your question along with descriptions of available tools
The LLM reasons about which tool would help answer the question
The agent executes that tool and captures the output
The LLM receives the output and decides whether to use another tool or provide a final answer
This cycle continues until the LLM has enough information

The agent manages this entire loop. It handles tool execution, captures outputs, formats prompts, and maintains conversation history. The LLM focuses purely on reasoning.

This separation is powerful. You do not need to embed tool execution logic into your prompts. You do not need to parse LLM responses to figure out which tool to call. The agent framework handles all of that.

The Four Agent Types

OpenSearch supports four distinct agent types, each designed for different use cases.

Flow Agent

A flow agent runs tools sequentially in a fixed order. You define the sequence when you register the agent, and it never deviates.

Think of it as a simple pipeline. Tool A runs first, its output feeds into Tool B, and Tool B produces the final result.

This is perfect for RAG. The VectorDBTool retrieves relevant documents. The MLModelTool sends those documents plus the question to an LLM. The LLM generates an answer. Every question follows this exact path.

Flow agents are fast because there is no reasoning overhead. The LLM is not deciding which tool to use. It only generates the final answer based on retrieved context.

Conversational Flow Agent

This is a flow agent with memory. The execution pattern is identical. Tools run in sequence. The difference is that conversation history persists across interactions.

When a user asks a follow-up question, the agent can reference previous exchanges. This enables multi-turn conversations where context accumulates over time.

Chatbots built on RAG typically use conversational flow agents. The user asks a question, gets an answer, then asks a clarifying question. The second question makes sense only because the agent remembers the first.

Conversational Agent

This is where things get interesting. A conversational agent does not follow a fixed execution path. Instead, it reasons about which tools to use based on the question.

You configure the agent with an LLM and a set of available tools. When a question arrives, the LLM evaluates the question against tool descriptions and decides which tool would help. After executing that tool, the LLM evaluates again. Should it use another tool? Should it provide a final answer?

This iterative reasoning process is called Chain-of-Thought. The LLM thinks step by step, using tools as needed, until it reaches a conclusion.

Conversational agents are more flexible than flow agents. They can handle questions that require different tool combinations. The same agent might use VectorDBTool for one question and CATIndexTool for another.

But flexibility comes with cost. Each reasoning step requires an LLM call. Complex questions might trigger five or ten LLM calls before reaching an answer. This adds latency and expense.

Plan-Execute-Reflect Agent

The plan-execute-reflect agent handles the most complex scenarios. It breaks down multi-step tasks into discrete steps, executes each step, and adapts its plan based on results.

The process works in three phases:

Planning: A planner LLM receives the task and generates a step-by-step plan. Each step describes what to do, which tool to use, and what information to gather.
Execution: The agent executes each step using an internal conversational agent. One step might search an index. Another might analyze the schema. Each produces intermediate results.
Reflection: After executing a step, the planner LLM receives the results and re-evaluates the plan. Should the next step change based on what we learned? Should we skip steps? Add new ones?

This iterative refinement is what makes plan-execute-reflect agents so powerful. They adapt to what they discover during execution.

Consider a troubleshooting scenario. You ask the agent to identify why a service is failing. The initial plan might be: analyze logs, check traces, examine metrics.

But while analyzing logs, the agent discovers a specific error pattern. It updates the plan to focus on that pattern, skipping unrelated steps and adding new ones to investigate the root cause.

Plan-execute-reflect agents run asynchronously because they can take significant time. You submit the task, receive a task ID, and poll for completion.

Tools: What Agents Can Actually Do

Tools are the actions agents can take. Each tool performs a specific task and returns results that the agent can use for further reasoning.

OpenSearch provides a comprehensive set of built-in tools:

VectorDBTool performs semantic search using vector embeddings. You configure it with an embedding model, target index, and embedding field. When executed, it converts the question into a vector and retrieves similar documents.
MLModelTool sends prompts to an LLM and returns the response. This is how agents generate final answers or intermediate analysis. You configure it with a model ID and prompt template.
SearchIndexTool executes DSL queries against OpenSearch indexes. Unlike VectorDBTool, this performs traditional keyword search with full query DSL support.
ListIndexTool returns information about available indexes. Useful when an agent needs to understand what data exists before querying it.
IndexMappingTool retrieves the schema of an index. This helps agents understand field types and structure before constructing queries.
CATIndexTool executes the cat indices API, providing health status, document counts, and storage information for indexes.
PPLTool executes Piped Processing Language queries. PPL is an alternative query syntax that some users find more intuitive than DSL.
WebSearchTool searches the web for external information. This extends agent capabilities beyond your OpenSearch data.
QueryPlanningTool is special. It converts natural language questions into DSL queries. This is the foundation of agentic search, where users ask questions in plain English and the system generates appropriate queries.

You can also create custom tools using the AgentTool, which wraps another agent as a tool. This enables hierarchical agent architectures where a parent agent delegates subtasks to specialized child agents.

Agentic Search: Natural Language to DSL

Agentic search is OpenSearch’s flagship agent application. It lets users ask questions in natural language and receive search results without writing DSL queries.

The system works like this:

User submits a natural language question
QueryPlanningTool analyzes the question and index schema
The LLM generates an appropriate DSL query
OpenSearch executes the query
Results return to the user

Behind the scenes, an agent with QueryPlanningTool orchestrates this flow. The agent can be a simple flow agent for straightforward queries, or a conversational agent for complex scenarios requiring multiple tools.

Agentic search in practice starts with registering a connector to your LLM:

POST /_plugins/_ml/connectors/_create
{
  "name": "Claude Connector",
  "description": "Connector for Anthropic Claude",
  "version": "1.0",
  "protocol": "aws_sigv4",
  "credential": {
    "access_key": "your_access_key",
    "secret_key": "your_secret_key",
    "session_token": "your_session_token"
  },
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock",
    "model": "anthropic.claude-3-sonnet-20240229-v1:0"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-sonnet-20240229-v1:0/converse",
      "headers": {
        "Content-Type": "application/json"
      },
      "request_body": "{ \"messages\": [{\"role\": \"user\", \"content\": [{\"text\": \"${parameters.prompt}\"}]}] }"
    }
  ]
}

Then register a model using that connector:

POST /_plugins/_ml/models/_register
{
  "name": "Claude Model for Agentic Search",
  "function_name": "remote",
  "description": "Claude model for query planning",
  "connector_id": "your_connector_id"
}

Deploy the model and register an agent:

POST /_plugins/_ml/agents/_register
{
  "name": "Agentic Search Agent",
  "type": "conversational",
  "description": "Agent for natural language search",
  "llm": {
    "model_id": "your_model_id",
    "parameters": {
      "max_iteration": 10
    }
  },
  "memory": {
    "type": "conversation_index"
  },
  "parameters": {
    "_llm_interface": "bedrock/converse"
  },
  "tools": [
    {
      "type": "QueryPlanningTool"
    }
  ],
  "app_type": "os_chat"
}

Create a search pipeline that uses the agent:

PUT _search/pipeline/agentic-pipeline
{
  "request_processors": [
    {
      "agentic_query_translator": {
        "agent_id": "your_agent_id"
      }
    }
  ]
}

Now you can search with natural language:

GET products/_search?search_pipeline=agentic-pipeline
{
  "query": {
    "agentic": {
      "query_text": "Find me blue shoes under 100 dollars",
      "query_fields": ["product_name", "color", "price"]
    }
  }
}

The agent receives this question, analyzes the products index schema, and generates a DSL query that filters by color and price range. You receive search results without writing a single line of query DSL.

Building a RAG Agent Step by Step

Let us build a complete RAG agent using a flow agent architecture. This agent will retrieve documents from a vector index and use an LLM to generate answers.

Step 1: Configure Cluster Settings

Enable ML Commons features:

PUT _cluster/settings
{
  "persistent": {
    "plugins.ml_commons.only_run_on_ml_node": "false",
    "plugins.ml_commons.memory_feature_enabled": "true"
  }
}

Step 2: Deploy an Embedding Model

POST /_plugins/_ml/models/_register?deploy=true
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
  "version": "1.0.2",
  "model_format": "TORCH_SCRIPT"
}

Note the model ID from the response.

Step 3: Create a Vector Index

Create an index with a k-NN vector field:

PUT knowledge_base
{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      },
      "embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene"
        }
      }
    }
  }
}

Step 4: Create an Ingest Pipeline

Set up automatic embedding generation:

PUT _ingest/pipeline/embedding-pipeline
{
  "processors": [
    {
      "text_embedding": {
        "model_id": "your_embedding_model_id",
        "field_map": {
          "content": "embedding"
        }
      }
    }
  ]
}

Step 5: Index Documents

Add knowledge base content:

POST knowledge_base/_doc?pipeline=embedding-pipeline
{
  "content": "OpenSearch is a community-driven, open source search and analytics suite derived from Apache 2.0 licensed Elasticsearch 7.10.2 and Kibana 7.10.2."
}

POST knowledge_base/_doc?pipeline=embedding-pipeline
{
  "content": "OpenSearch supports vector search through the k-NN plugin, enabling semantic search and similarity matching using dense vector embeddings."
}

Step 6: Set Up the LLM Connector

Create a connector to your preferred LLM:

POST /_plugins/_ml/connectors/_create
{
  "name": "Bedrock Claude Connector",
  "description": "Connector for Claude on Bedrock",
  "version": "1.0",
  "protocol": "aws_sigv4",
  "credential": {
    "access_key": "your_access_key",
    "secret_key": "your_secret_key"
  },
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-sonnet-20240229-v1:0/invoke",
      "headers": {
        "Content-Type": "application/json"
      },
      "request_body": "{ \"anthropic_version\": \"bedrock-2023-05-31\", \"max_tokens\": 1024, \"messages\": [{\"role\": \"user\", \"content\": \"${parameters.prompt}\"}] }",
      "post_process_function": "return params.content[0].text;"
    }
  ]
}

Step 7: Register and Deploy the LLM

POST /_plugins/_ml/models/_register
{
  "name": "Claude for RAG",
  "function_name": "remote",
  "description": "Claude model for generating answers",
  "connector_id": "your_connector_id"
}

Deploy the model:

POST /_plugins/_ml/models/your_llm_model_id/_deploy

Step 8: Register the Flow Agent

Now create an agent that combines VectorDBTool and MLModelTool:

POST /_plugins/_ml/agents/_register
{
  "name": "Knowledge Base RAG Agent",
  "type": "flow",
  "description": "RAG agent for knowledge base queries",
  "tools": [
    {
      "type": "VectorDBTool",
      "name": "knowledge_retriever",
      "parameters": {
        "model_id": "your_embedding_model_id",
        "index": "knowledge_base",
        "embedding_field": "embedding",
        "source_field": ["content"],
        "input": "${parameters.question}"
      }
    },
    {
      "type": "MLModelTool",
      "name": "answer_generator",
      "description": "Generates answers based on retrieved context",
      "parameters": {
        "model_id": "your_llm_model_id",
        "prompt": "Human: You are a helpful assistant. Answer the question based only on the provided context. If the context does not contain enough information, say so.\n\nContext:\n${parameters.knowledge_retriever.output}\n\nQuestion: ${parameters.question}\n\nAssistant:"
      }
    }
  ]
}

Step 9: Execute the Agent

Ask the agent a question:

POST /_plugins/_ml/agents/your_agent_id/_execute
{
  "parameters": {
    "question": "What is OpenSearch and does it support vector search?"
  }
}

The agent retrieves relevant documents using vector similarity, passes them to Claude, and returns a contextual answer.

Memory: Enabling Multi-Turn Conversations

Agents can maintain conversation history using memory. This enables follow-up questions that reference previous exchanges.

When registering an agent with memory:

POST /_plugins/_ml/agents/_register
{
  "name": "Conversational Knowledge Agent",
  "type": "conversational_flow",
  "description": "Agent with conversation memory",
  "app_type": "rag",
  "memory": {
    "type": "conversation_index"
  },
  "tools": [
    {
      "type": "VectorDBTool",
      "parameters": {
        "model_id": "your_embedding_model_id",
        "index": "knowledge_base",
        "embedding_field": "embedding",
        "source_field": ["content"],
        "input": "${parameters.question}"
      }
    },
    {
      "type": "MLModelTool",
      "parameters": {
        "model_id": "your_llm_model_id",
        "prompt": "Previous conversation:\n${parameters.chat_history}\n\nContext:\n${parameters.VectorDBTool.output}\n\nQuestion: ${parameters.question}\n\nAnswer:"
      }
    }
  ]
}

The first execution creates a memory ID:

POST /_plugins/_ml/agents/your_agent_id/_execute
{
  "parameters": {
    "question": "What is OpenSearch?"
  }
}

The response includes a memory_id. Use it for follow-up questions:

POST /_plugins/_ml/agents/your_agent_id/_execute
{
  "parameters": {
    "question": "Does it support machine learning?",
    "memory_id": "memory_id_from_previous_response"
  }
}

The agent now has context from the previous exchange. It understands that “it” refers to OpenSearch.

Plan-Execute-Reflect in Action

For complex tasks requiring multi-step reasoning, use a plan-execute-reflect agent. Setting one up for troubleshooting scenarios looks like this:

POST /_plugins/_ml/agents/_register
{
  "name": "Troubleshooting Agent",
  "type": "plan_execute_and_reflect",
  "description": "Agent for investigating system issues",
  "llm": {
    "model_id": "your_llm_model_id",
    "parameters": {
      "prompt": "${parameters.question}"
    }
  },
  "memory": {
    "type": "conversation_index"
  },
  "parameters": {
    "_llm_interface": "bedrock/converse"
  },
  "tools": [
    {
      "type": "ListIndexTool"
    },
    {
      "type": "IndexMappingTool"
    },
    {
      "type": "SearchIndexTool"
    },
    {
      "type": "VectorDBTool",
      "parameters": {
        "model_id": "your_embedding_model_id",
        "index": "logs",
        "embedding_field": "embedding",
        "source_field": ["message"],
        "input": "${parameters.input}"
      }
    }
  ],
  "app_type": "os_chat"
}

Execute asynchronously:

POST /_plugins/_ml/agents/your_agent_id/_execute?async=true
{
  "parameters": {
    "question": "Why is the checkout service failing? Analyze logs and traces to identify the root cause."
  }
}

The agent will:

Plan steps: list available indexes, understand schemas, search logs, and analyze patterns
Execute each step, gathering information
Reflect after each step, adjusting the plan based on findings
Provide a final analysis with root cause identification

MCP: Extending Agent Capabilities

Model Context Protocol (MCP) enables agents to connect to external tools and data sources. This is how you extend agent capabilities beyond OpenSearch’s built-in tools.

First, enable MCP in cluster settings:

PUT _cluster/settings
{
  "persistent": {
    "plugins.ml_commons.mcp_connector_enabled": "true"
  }
}

Create an MCP connector:

POST /_plugins/_ml/connectors/_create
{
  "name": "Weather MCP Connector",
  "description": "Connects to weather service via MCP",
  "version": "1.0",
  "protocol": "mcp",
  "parameters": {
    "endpoint": "https://weather-service.example.com/mcp"
  }
}

POST /_plugins/_ml/agents/_register
{
  "name": "Weather-Aware Search Agent",
  "type": "conversational",
  "llm": {
    "model_id": "your_llm_model_id",
    "parameters": {
      "max_iteration": 5
    }
  },
  "memory": {
    "type": "conversation_index"
  },
  "parameters": {
    "_llm_interface": "bedrock/converse",
    "mcp_connectors": [
      {
        "mcp_connector_id": "your_mcp_connector_id",
        "tool_filters": ["^get_weather$", "^get_forecast$"]
      }
    ]
  },
  "tools": [
    {
      "type": "SearchIndexTool"
    }
  ],
  "app_type": "os_chat"
}

Now the agent can fetch weather data from external services while also searching your OpenSearch indexes.

Agent Best Practices

Choose the Right Agent Type

Use flow agents when the execution path is predictable. RAG is the classic example. Every question follows the same pattern, retrieve then generate.

Use conversational agents when questions might require different tool combinations. If some questions need vector search while others need schema inspection, a conversational agent can choose the right approach.

Use plan-execute-reflect agents for complex, multi-step tasks. Root cause analysis, research queries, and tasks requiring iterative refinement benefit from this architecture.

Optimize Tool Descriptions

Conversational agents rely on tool descriptions to decide which tool to use. Vague descriptions lead to poor tool selection.

Bad description: “A tool for searching”

Good description: “Searches the products index using semantic similarity. Use this when questions involve finding products by description, features, or attributes. Returns the top 5 most similar products with their names, prices, and descriptions.”

Limit Tools Per Agent

Each tool adds complexity to the LLM’s reasoning process. An agent with twenty tools will make more mistakes than one with five.

Create specialized agents for different domains rather than one agent that does everything. A product search agent, a log analysis agent, and a customer support agent will each perform better than a single omniscient agent.

Handle Failures Gracefully

LLM calls can fail. Network issues, rate limits, and model errors all happen. Configure retry policies:

PUT /_plugins/_ml/connectors/your_connector_id
{
  "client_config": {
    "max_retry_times": 3,
    "retry_backoff_millis": 500,
    "retry_backoff_policy": "exponential_full_jitter"
  }
}

For plan-execute-reflect agents running long tasks, set max_retry_times to -1 for unlimited retries with backoff.

Monitor Agent Performance

Use the Get Message Traces API to understand what agents are doing

GET /_plugins/_ml/memory/message/your_message_id/traces

This returns the sequence of tools used, intermediate outputs, and reasoning steps. Essential for debugging when agents produce unexpected results.

Cloud Deployment Considerations

AWS OpenSearch Service

Amazon OpenSearch Service supports agents starting with version 2.13. Key considerations:

Use Amazon Bedrock for LLM integration via the built-in connector
IAM roles must include bedrock: InvokeModel permissions
ML nodes are recommended for production workloads
Agentic search with OpenSearch 3.3+ provides additional features

Alibaba Cloud OpenSearch

Alibaba Cloud provides managed OpenSearch with ML capabilities:

Use DashScope for LLM integration with Qwen models
RAM roles control access to ML services
Dedicated ML nodes available in enhanced editions
Consider using PAI-EAS for custom model deployment

Agents transform OpenSearch from a search engine into an intelligent system. They use LLMs to reason about problems, tools to take actions, and memory to maintain context.

Flow agents run tools in sequence for predictable workflows like RAG.

Conversational agents reason about which tools to use and adapt to different questions.

Plan-execute-reflect agents break down complex tasks, execute them step by step, and adapt based on results.

Agentic search converts natural language to DSL queries, enabling search without query expertise.

MCP extends agent capabilities with external tools and data sources.

Tomorrow we will explore analyzers and tokenizers to understand how OpenSearch processes text before indexing and searching.

Interactive guide: https://opensearch.9cld.com/day/08-agentic-ai

All guides: https://opensearch.9cld.com/

Day 7: RAG with OpenSearch

Ankit Mehta — Tue, 27 Jan 2026 07:49:27 GMT

The Problem Nobody Talks About

You built a chatbot. It sounds smart. It responds instantly. Users love it.

Then someone asks about your company’s refund policy.

The LLM confidently invents one. Wrong dates. Wrong percentages. Completely made up.

You trained it on your data, right? No. You fine-tuned it? Too expensive. You hoped it would just know? That is what we all did.

The gap between “AI that sounds smart” and “AI that is actually correct” is where most GenAI projects fail.

RAG bridges that gap.

What is RAG, Really?

RAG stands for Retrieval-Augmented Generation. Fancy name. Simple idea.

Instead of asking the LLM to remember everything, you give it a search engine.

User asks a question. You search your knowledge base for relevant documents. You stuff those documents into the LLM prompt. The LLM generates an answer using your actual data.

The LLM does not hallucinate because it is not guessing. It is reading.

Think of it like this. You do not memorize every fact in your company handbook. You look things up when someone asks. RAG makes your AI do the same thing.

Why OpenSearch for RAG?

You could use any vector database. Pinecone. Weaviate. Chroma. They all store embeddings and find similar vectors.

So why OpenSearch?

OpenSearch does more than vector search.

Most RAG tutorials show you the happy path. Embed query, find similar chunks, done. Real-world RAG is messier.

What if the user asks “What was our Q3 revenue?” A vector search finds documents about revenue, but you also need exact keyword matching to get Q3 and not Q2.

You need date filtering to get this year and not last year.

You need aggregations to sum the numbers, not just find them.

You need metadata filtering to restrict results to the finance department.

Vector-only databases cannot do this. You end up bolting on a second search system.

OpenSearch gives you hybrid search out of the box. Vectors plus keywords plus filters plus aggregations. One system.

OpenSearch also has a native RAG processor.

Most RAG implementations work like this.

You search OpenSearch.
You pull results into your application.
You build the prompt in Python.
You send it to the LLM.
You return the response.

Five steps, lots of code.

OpenSearch does all of this inside a search pipeline.

Query hits OpenSearch.
RAG processor retrieves context, calls LLM, and returns grounded answer.

One API call. Your application stays simple.

The Architecture

Before we build, let us understand what we are building.

The flow starts with your application. User asks something like “What is our refund policy?” That query goes to OpenSearch.

Inside OpenSearch, there are two pipelines. The ingest pipeline handles documents when you add them.

It chunks long documents into smaller pieces, converts those chunks into vectors using an embedding model, and stores everything in the index.

The search pipeline handles queries. It takes the user’s question, converts it to a vector using the same embedding model, finds relevant chunks using hybrid search (both vector similarity and keyword matching), and then the RAG processor takes over.

It grabs the top chunks as context, builds a prompt, calls your LLM (Claude, GPT, DeepSeek, whatever you configured), and returns the grounded answer.

The key insight is that OpenSearch handles both retrieval and orchestration. The LLM only does generation.

The Components

Let us break down each piece before we start building.

Connectors are bridges between OpenSearch and external AI services. You need two of them.

An embedding connector that converts text to vectors, and an LLM connector that generates answers.

Think of connectors as API configurations. They store endpoints, credentials, and request templates.

ML Models wrap connectors in a deployable unit. This sounds redundant, but it gives you versioning so you can roll back bad models, access control so you can restrict who uses what, and resource management for memory limits.

Ingest Pipelines process documents before storing them.

For RAG, you need text chunking to split long documents into smaller pieces, and text embedding to convert those chunks into vectors. Why chunk? LLMs have context limits.

You cannot stuff a 50-page document into a prompt. Chunking creates bite-sized pieces that fit.

Search Pipelines process queries and results.

For RAG, you need a neural query enricher to convert query text into a vector, and the RAG processor to retrieve context, call the LLM, and return the answer.

Building RAG: Step by Step

Now we build. I will explain each step, then show the code.

Step 1: Enable ML Commons

OpenSearch’s ML features are disabled by default. You need to enable them.

PUT _cluster/settings
{
  "persistent": {
    "plugins.ml_commons.only_run_on_ml_node": false,
    "plugins.ml_commons.memory_feature_enabled": true,
    "plugins.ml_commons.rag_pipeline_feature_enabled": true,
    "plugins.ml_commons.connector_access_control_enabled": true,
    "plugins.ml_commons.model_access_control_enabled": true
  }
}

Why these settings? The only_run_on_ml_node setting lets you run models on any node, not just dedicated ML nodes. Good for development, not production.

The memory_feature_enabled setting enables conversation memory for multi-turn chat.

The rag_pipeline_feature_enabled setting enables the RAG processor in search pipelines.

The two access control settings restrict who can create connectors and deploy models. Security matters.

Step 2: Create a Model Group

Model groups control access to models. Think of them as folders with permissions.

POST _plugins/_ml/model_groups/_register
{
  "name": "rag-model-group",
  "description": "Models for RAG pipeline",
  "access_mode": "public"
}

This returns a model_group_id. Save it. You will need it for every model you register.

Step 3: Register the Embedding Connector

Now we connect to an embedding service. This example uses Amazon Bedrock with Titan Embeddings.

POST _plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Titan Embedding Connector",
  "description": "Connector for Titan Embeddings V2",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock",
    "model": "amazon.titan-embed-text-v2:0"
  },
  "credential": {
    "roleArn": "arn:aws:iam::YOUR_ACCOUNT:role/OpenSearchBedrockRole"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-embed-text-v2:0/invoke",
      "headers": {
        "content-type": "application/json"
      },
      "request_body": "{ \"inputText\": \"${parameters.inputText}\", \"dimensions\": 1024, \"normalize\": true }",
      "pre_process_function": "connector.pre_process.bedrock.embedding",
      "post_process_function": "connector.post_process.bedrock.embedding"
    }
  ]
}

What is happening here?

The protocol: aws_sigv4 uses AWS IAM for authentication. No API keys stored in your cluster.
The parameters section defines the AWS region and Bedrock model.
The credential.roleArn is the IAM role OpenSearch assumes to call Bedrock.
The actions section defines how to call the API.
The request_body template injects your text.
The pre and post process functions are built-in helpers that format requests and parse responses.

Save the returned connector_id.

Step 4: Register and Deploy the Embedding Model

Wrap the connector in a model.

POST _plugins/_ml/models/_register
{
  "name": "Titan Embedding Model",
  "function_name": "remote",
  "model_group_id": "YOUR_MODEL_GROUP_ID",
  "description": "Titan Text Embeddings V2 for RAG",
  "connector_id": "YOUR_EMBEDDING_CONNECTOR_ID"
}

This returns a model_id and a task_id. The model registers asynchronously. Check the status with GET _plugins/_ml/tasks/YOUR_TASK_ID.

Once complete, deploy the model.

POST _plugins/_ml/models/YOUR_EMBEDDING_MODEL_ID/_deploy

Now your embedding model is ready.

Step 5: Register the LLM Connector

Same process for the LLM. This example uses Claude 3 Sonnet on Bedrock.

POST _plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Claude Connector",
  "description": "Connector for Claude 3 Sonnet",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock",
    "model": "anthropic.claude-3-sonnet-20240229-v1:0"
  },
  "credential": {
    "roleArn": "arn:aws:iam::YOUR_ACCOUNT:role/OpenSearchBedrockRole"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-sonnet-20240229-v1:0/invoke",
      "headers": {
        "content-type": "application/json"
      },
      "request_body": "{ \"anthropic_version\": \"bedrock-2023-05-31\", \"max_tokens\": 1024, \"messages\": [{\"role\": \"user\", \"content\": \"${parameters.prompt}\"}] }",
      "post_process_function": "\n  StringBuilder sb = new StringBuilder();\n  for (int i=0; i

The post_process_function is Painless script that extracts the text from Claude’s response format.

`Step 6: Register and Deploy the LLM Model`

POST _plugins/_ml/models/_register
{
  "name": "Claude 3 Sonnet Model",
  "function_name": "remote",
  "model_group_id": "YOUR_MODEL_GROUP_ID",
  "description": "Claude 3 Sonnet for RAG generation",
  "connector_id": "YOUR_LLM_CONNECTOR_ID"
}

Wait for registration, then deploy.

POST _plugins/_ml/models/YOUR_LLM_MODEL_ID/_deploy

`Step 7: Create the Ingest Pipeline`

Now we create the pipeline that processes documents.

PUT _ingest/pipeline/rag-ingest-pipeline
{
  "description": "Pipeline for RAG document processing",
  "processors": [
    {
      "text_chunking": {
        "algorithm": {
          "fixed_token_length": {
            "token_limit": 384,
            "overlap_rate": 0.2,
            "tokenizer": "standard"
          }
        },
        "field_map": {
          "content": "content_chunk"
        }
      }
    },
    {
      "text_embedding": {
        "model_id": "YOUR_EMBEDDING_MODEL_ID",
        "field_map": {
          "content_chunk": "content_embedding"
        }
      }
    }
  ]
}

The text chunking processor splits documents.

Each chunk is at most 384 tokens, which is about 300 words. The 20% overlap prevents losing context at chunk boundaries.

The field_map takes the content field and outputs content_chunk as an array of chunks.

The text embedding processor takes each chunk, calls the embedding model, and stores vectors in content_embedding.

Why 384 tokens? It is a balance. Too small and you lose context. Too large and retrieval gets fuzzy. 256 to 512 is the sweet spot for most use cases.

`Step 8: Create the Knowledge Base Index`

Now create an index that stores your documents.

PUT /knowledge-base
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "rag-ingest-pipeline",
    "number_of_shards": 2,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "content": { "type": "text" },
      "content_chunk": { "type": "text" },
      "content_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene",
          "parameters": {
            "ef_construction": 128,
            "m": 16
          }
        }
      },
      "source": { "type": "keyword" },
      "last_updated": { "type": "date" }
    }
  }
}

The index.knn: true enables vector search.

The default_pipeline means every document goes through the ingest pipeline automatically.

The dimension: 1024 must match your embedding model’s output.

The method.name: hnsw is the HNSW algorithm for fast approximate nearest neighbor search.

The ef_construction and m parameters tune the algorithm. Higher values mean better recall but slower indexing.

`Step 9: Create the Search Pipeline`

This is where RAG happens.

PUT _search/pipeline/rag-search-pipeline
{
  "description": "RAG search pipeline",
  "request_processors": [
    {
      "neural_query_enricher": {
        "default_model_id": "YOUR_EMBEDDING_MODEL_ID",
        "neural_field_default_id": {
          "content_embedding": "YOUR_EMBEDDING_MODEL_ID"
        }
      }
    }
  ],
  "response_processors": [
    {
      "retrieval_augmented_generation": {
        "model_id": "YOUR_LLM_MODEL_ID",
        "context_field_list": ["content_chunk"],
        "system_prompt": "You are a helpful assistant that answers questions based on the provided context. If the context does not contain enough information to answer the question, say so. Do not make up information.",
        "user_instructions": "Answer the following question based on the context provided:"
      }
    }
  ]
}

Request processors run before the search. The neural query enricher converts your text query into a vector automatically.

Response processors run after the search.

The retrieval_augmented_generation processor takes search results, builds a prompt, calls the LLM, and returns the answer.

The context_field_list specifies which fields to include as context.

The system_prompt is critical. It tells the LLM to stay grounded. Without it, the LLM might still hallucinate.

The user_instructions get prepended to the user’s question.

`Step 10: Index Your Documents`

Now add your knowledge base.

POST /knowledge-base/_doc
{
  "title": "Refund Policy",
  "content": "Our refund policy allows customers to return items within 30 days of purchase. Items must be unused and in original packaging. A valid receipt is required for all returns. Refunds are processed within 5-7 business days after we receive the returned item. Shipping costs are non-refundable unless the return is due to our error.",
  "source": "policies/refund.md",
  "last_updated": "2024-01-15"
}

The ingest pipeline automatically chunks the content, embeds each chunk, and stores everything. Add more documents the same way.

`Step 11: Query with RAG`

Finally, ask a question.

POST /knowledge-base/_search?search_pipeline=rag-search-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "content_chunk": {
              "query": "What is the refund policy?"
            }
          }
        },
        {
          "neural": {
            "content_embedding": {
              "query_text": "What is the refund policy?",
              "k": 5
            }
          }
        }
      ]
    }
  },
  "size": 3,
  "ext": {
    "generative_qa_parameters": {
      "llm_question": "What is the refund policy?"
    }
  }
}

The hybrid search runs both keyword match and vector search. The neural query enricher already converted text to vector.

The size of 3 returns the top 3 chunks as context.

The generative_qa_parameters tells the RAG processor what question to answer.

The response includes standard search hits (the chunks that matched) and an ext.retrieval_augmented_generation.answer field with the LLM’s grounded response.

`AWS vs Alibaba Cloud`

If you are deploying to production, here is what changes.

On AWS OpenSearch Service, you use Amazon Bedrock for both embeddings and LLM. Claude, Titan, Cohere, and DeepSeek-R1 are all available.

Authentication works via IAM roles, which is recommended, or access keys. Native integration with SageMaker gives you custom models. Expect to pay around $300 per month for 2x r6g.large nodes plus about $15 per million tokens for Claude.

On Alibaba Cloud OpenSearch, you use Model Studio (DashScope) for LLM access and PAI (Platform for AI) for custom embeddings.

Qwen models replace Claude. RAM roles replace IAM. The connector API structure is different. Expect around $250 per month for comparable nodes plus about $10 per million tokens for Qwen.

The OpenSearch APIs are identical. Only the connectors change.

`Production Checklist`

Before you ship, there are things to verify.

For search quality, use hybrid search, not just vectors. Keywords matter for precision. Include metadata in context like source and date. This helps the LLM cite sources.

Test with adversarial questions like “What color is the CEO’s car?” The system should say “I do not know.” Monitor hallucination rate by tracking questions where the answer is not in the context.

For performance, add reranking for better precision. Tune chunk size based on your content. Shorter chunks work better for FAQs, longer for documentation. Consider caching frequent queries. Set LLM timeout appropriately, usually 30 to 60 seconds.

For security, use IAM roles rather than access keys. Enable model access control. Audit who creates connectors. Do not expose RAG endpoints publicly without authentication.

For cost, remember that embedding calls happen on every document ingest, and LLM calls happen on every query. Monitor token usage. Consider smaller models for simple Q&A.

`Common Mistakes`

Vector-only retrieval. Vector search finds semantically similar content. But “Q3” and “Q2” are semantically similar. Keyword search catches exact matches. Use hybrid.

No system prompt. Without instructions, the LLM will make things up. Always tell it to stay grounded and admit when it does not know.

Too many chunks in context. More context is not always better. Irrelevant chunks confuse the LLM. Start with 3 to 5, tune from there.

No metadata. If the LLM cannot cite sources, users do not trust the answer. Include the source and date in every document.

`What is Next?`

You now have a complete RAG pipeline running in OpenSearch. Your chatbot reads your actual data instead of making things up.

In Day 8, we go deeper into AI agents. Systems that can plan, reason, and take actions autonomously using OpenSearch as their knowledge backbone.

The foundation is set. Time to build intelligence on top.

Interactive Guide: https://opensearch.9cld.com/day/07-rag

Full Series: https://opensearch.9cld.com/

Thanks for reading 9Cloud! This post is public so feel free to share it.

Share



Talking to OpenSearch
Ankit Mehta — Mon, 26 Jan 2026 09:33:12 GMT
The Conversation Nobody Teaches
OpenSearch is sitting there. Running. Waiting. Green status. Ready.
But ready for what?
Most people get stuck here. They followed a tutorial to install OpenSearch. It worked. Now they stare at a terminal, wondering what to type next.
The documentation shows hundreds of API endpoints. The examples assume you already know what an index is. The queries look like alien hieroglyphics.
Here is what nobody tells you. OpenSearch is just a web server. It listens on port 9200. It accepts HTTP requests. It returns JSON responses.
That is it.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.
Every fancy feature. Every complex query. Every advanced operation. It's all just HTTP requests to a web server.
Once you understand this, everything else clicks into place.
The Shape of Every Request
curl -X METHOD 'https://localhost:9200/path' -d 'data'
METHOD is what you want to do. GET to read. PUT to create. POST to add or update. DELETE to remove.
Path is where you want to do it. An index name. A document ID. A special endpoint like _search or _cluster.
Data is the details. The document you want to store. The query you want to run. The settings you want to apply.
Every single OpenSearch operation follows this pattern. Learn it once. Use it forever.
Before We Start
Make sure OpenSearch is running.
podman ps | grep opensearch
If you see your container listed, you are good. If not, start it.
podman run -d \
  --name opensearch-node \
  -p 9200:9200 \
  -p 9600:9600 \
  -e 'discovery.type=single-node' \
  -e 'OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpenSearch@2024Secure' \
  -e 'OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m' \
  opensearchproject/opensearch:latest
Wait a minute for startup. Then verify.
curl -k -u 'admin:OpenSearch@2024Secure' 'https://localhost:9200'
You should see JSON with cluster information.
To save typing, create an alias.
alias os='curl -s -k -u admin:OpenSearch@2024Secure'
Now os replaces the long curl command. I will use this alias for the rest of the guide.
Checking Health
First question to ask any cluster. Are you okay?
os -X GET 'https://localhost:9200/_cluster/health?pretty'
The underscore before cluster means this is a special system endpoint. The pretty parameter formats the JSON for humans.
Look at the status field in the response.
Green means everything is perfect. All data is available and replicated.
Yellow means data is available but not fully replicated. Normal for single node setups.
Red means some data is unavailable. Something is wrong.
For learning on a single node, yellow is expected and fine.
Creating a Place for Data
OpenSearch stores data in indices. Think of an index as a container for related documents.
Before adding data, create an index.
os -X PUT 'https://localhost:9200/products' -H 'Content-Type: application/json' -d '{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}'
PUT because we are creating something with a specific name. The name is products. Settings configure how the index behaves.
One shard means all data lives in one place. Zero replicas means no backup copies. Fine for learning. Not for production.
The response confirms creation.
{
  "acknowledged": true,
  "index": "products"
}
Defining Structure
OpenSearch can guess the structure of your data. But guessing leads to surprises. Better to define it explicitly.
Delete the index and recreate it with a mapping.
os -X DELETE 'https://localhost:9200/products'

os -X PUT 'https://localhost:9200/products' -H 'Content-Type: application/json' -d '{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "category": { "type": "keyword" },
      "price": { "type": "float" },
      "description": { "type": "text" },
      "in_stock": { "type": "boolean" }
    }
  }
}'
The mappings section defines what each field contains.
text fields are for full text search. OpenSearch breaks them into individual words. Searching for “wireless” finds “Wireless Headphones” even though the case differs and the word is part of a phrase.
keyword fields are for exact values. Categories. Tags. IDs. Status codes. OpenSearch stores them exactly as provided. Searching for “Electronics” only finds “Electronics” not “Electronic” or “electronics”.
float for decimal numbers. boolean for true or false. integer for whole numbers. date for timestamps.
Choosing wrong causes problems. A category stored as text matches partial words. A description stored as keyword requires exact phrase matching. Think about how you will search each field.
Adding Documents
Now add some data.
os -X POST 'https://localhost:9200/products/_doc/1' -H 'Content-Type: application/json' -d '{
  "name": "Wireless Headphones",
  "category": "Electronics",
  "price": 149.99,
  "description": "Premium noise-canceling wireless headphones with 30-hour battery life",
  "in_stock": true
}'
POST to the _doc endpoint adds a document. The 1 at the end is the document ID. You choose it. If you leave it out, OpenSearch generates a random one.
Add a few more.
os -X POST 'https://localhost:9200/products/_doc/2' -H 'Content-Type: application/json' -d '{
  "name": "Running Shoes",
  "category": "Sports",
  "price": 89.99,
  "description": "Lightweight running shoes for marathon training",
  "in_stock": true
}'

os -X POST 'https://localhost:9200/products/_doc/3' -H 'Content-Type: application/json' -d '{
  "name": "Coffee Maker",
  "category": "Kitchen",
  "price": 79.99,
  "description": "Automatic drip coffee maker with programmable timer",
  "in_stock": false
}'

os -X POST 'https://localhost:9200/products/_doc/4' -H 'Content-Type: application/json' -d '{
  "name": "Bluetooth Speaker",
  "category": "Electronics",
  "price": 59.99,
  "description": "Portable wireless speaker with deep bass",
  "in_stock": true
}'
Four products. Two electronics. One sports. One kitchen. Different prices. Different stock status. Enough variety to make searches interesting.
Getting Documents Back
Retrieve a document by its ID.
os -X GET 'https://localhost:9200/products/_doc/1?pretty'
The response includes your document in the _source field plus metadata like version number and index name.
{
  "_index": "products",
  "_id": "1",
  "_source": {
    "name": "Wireless Headphones",
    "category": "Electronics",
    "price": 149.99,
    "description": "Premium noise-canceling wireless headphones with 30-hour battery life",
    "in_stock": true
  }
}
This is direct retrieval. You know the ID. You get the document. Fast and simple.
But searching is different.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.
Searching
Search does not require knowing IDs. You describe what you want. OpenSearch finds matches.
The simplest search returns everything.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "match_all": {}
  }
}'
The _search endpoint accepts a query in the request body. The match_all query matches every document.
Results come back in the hits array. Each hit includes the document source and a score indicating relevance.
{
  "hits": {
    "total": { "value": 4 },
    "hits": [
      { "_id": "1", "_score": 1.0, "_source": { ... } },
      { "_id": "2", "_score": 1.0, "_source": { ... } }
    ]
  }
}
With match_all, every document scores the same because no criteria distinguish them.
Full Text Search
This is why OpenSearch exists. Finding documents by meaning, not exact strings.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "match": {
      "description": "wireless"
    }
  }
}'
The match query searches a text field for a word. This returns two products. The headphones because “wireless” appears in their description. The speaker because “wireless” also appears in its description.
Try searching for something not present.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "match": {
      "description": "waterproof"
    }
  }
}'
No results. None of our products mention waterproof.
Search for multiple words.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "match": {
      "description": "wireless battery"
    }
  }
}'
By default, match finds documents containing any of the words. Both wireless products match. The headphones score higher because their description contains both words.
Exact Matching
For keyword fields, use term instead of match.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "term": {
      "category": "Electronics"
    }
  }
}'
Term looks for the exact value. Electronics matches Electronics. Nothing else.
This distinction matters. If you use match on a keyword field, it usually works but behaves unexpectedly with multiple words. If you use term on a text field, it fails because the stored tokens do not match your exact input.
Match for text. Term for keywords. Remember this.
Filtering by Numbers
Find products in a price range.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "range": {
      "price": {
        "gte": 50,
        "lte": 100
      }
    }
  }
}'
The range query accepts boundaries. gte means greater than or equal. lte means less than or equal. gt and lt are the strict versions without the equal part.
This finds running shoes and coffee maker. Both priced between 50 and 100.
Combining Conditions
Real searches have multiple requirements. Find electronics that are in stock. Find products matching a search term under a certain price.
The bool query combines conditions.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "bool": {
      "must": [
        { "term": { "category": "Electronics" } }
      ],
      "filter": [
        { "term": { "in_stock": true } }
      ]
    }
  }
}'
The must clause requires conditions to match and affects relevance scoring.
The filter clause requires conditions to match but does not affect scoring. Use filter for yes or no conditions like stock status or category membership where relevance does not matter.
A more complex example. Find products mentioning wireless that are in stock and cost less than 100.
os -X GET 'https://localhost:9200/products/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "wireless" } }
      ],
      "filter": [
        { "term": { "in_stock": true } },
        { "range": { "price": { "lt": 100 } } }
      ]
    }
  }
}'
Only the Bluetooth speaker matches. The headphones are wireless and in stock but cost more than 100.
Modifying Documents
Update a document with new information.
os -X POST 'https://localhost:9200/products/_update/1' -H 'Content-Type: application/json' -d '{
  "doc": {
    "price": 129.99,
    "on_sale": true
  }
}'
The _update endpoint merges your changes with the existing document. Fields you specify are updated or added. Fields you omit remain unchanged.
Verify the update worked.
os -X GET 'https://localhost:9200/products/_doc/1?pretty'
The headphones now cost 129.99 and have a new on_sale field.
Removing Documents
Delete by ID.
os -X DELETE 'https://localhost:9200/products/_doc/4'
The speaker is gone. Permanently. No recycle bin. No undo.
Verify the count dropped.
os -X GET 'https://localhost:9200/products/_count?pretty'
Three documents remain.
Viewing Indices
List all indices in the cluster.
os -X GET 'https://localhost:9200/_cat/indices?v'
The _cat API returns plain text formatted for humans. The v parameter adds column headers.
You see your products index along with any system indices OpenSearch created automatically.
Cleaning Up
Delete an entire index.
os -X DELETE 'https://localhost:9200/products'
Everything gone. All documents. All mappings. All settings. Use carefully.
The Pattern Revealed
Every operation we performed followed the same pattern.
Check health. GET to _cluster/health.
Create an index. PUT to /indexname.
Add document. POST to /indexname/_doc/id.
Get the document. GET to /indexname/_doc/id.
Search. GET or POST to /indexname/_search with query body.
Update. POST to /indexname/_update/id.
Delete document. DELETE to /indexname/_doc/id.
Delete index. DELETE to /indexname.
The endpoint tells OpenSearch what you want. The method tells it what to do. The body provides details.
This pattern scales to every feature OpenSearch offers. Aggregations for analytics. Vector search for AI applications. Semantic search for natural language. All of them use the same structure.
You now speak OpenSearch.
Terminal recording at https://asciinema.org/a/772970
Interactive guide at https://opensearch.9cld.com/day/03-first-conversation



Running OpenSearch with Podman
Ankit Mehta — Sun, 25 Jan 2026 10:20:26 GMT
Why This Guide Exists
Most OpenSearch tutorials assume Docker. They give you commands that work. But what if you are on RHEL, Fedora, or CentOS? What if your organization banned Docker? What if you simply prefer rootless containers?
Podman is your answer.
This guide covers everything you need to run OpenSearch with Podman. Not just the commands. The reasoning behind them. The gotchas that will waste your afternoon. The password rules that reject seemingly strong passwords. The bash escaping that breaks your terminal.
All of it. From first principles.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.
What is Podman
Podman is a container engine developed by Red Hat. It runs containers the same way Docker does but without a central daemon process.
Every Docker command works with Podman. Replace docker with podman and everything runs the same way.
The differences that matter for OpenSearch.
No daemon means no single point of failure. Docker requires a background service running at all times. If dockerd crashes, all containers stop. Podman containers are regular processes. If one crashes, others keep running.
Rootless by default. Docker traditionally required root privileges. Podman runs containers as your regular user. This is better for security and better for shared servers where you do not have sudo access.
Systemd integration. Podman generates systemd unit files from containers. You can manage OpenSearch as a proper system service with start, stop, enable, and disable commands.
Drop in replacement. The podman command accepts the same flags as docker. You do not need to learn new syntax.
Prerequisites
Before we start, verify Podman is installed.
podman --version
You should see output like podman version 4.9.4 or similar. If not, install it.
For Fedora and RHEL
sudo dnf install podman
For Ubuntu and Debian
sudo apt install podman
For macOS
brew install podman
podman machine init
podman machine start
Starting OpenSearch
Here is the command that works.
podman run -d \
  --name opensearch-node \
  -p 9200:9200 \
  -p 9600:9600 \
  -e 'discovery.type=single-node' \
  -e 'OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpenSearch@2024Secure' \
  -e 'OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m' \
  opensearchproject/opensearch:latest
Let me explain every part.
podman run creates and starts a new container.
-d runs it in detached mode. The container runs in the background and returns control to your terminal.
--name opensearch-node gives the container a name you can remember. Without this, Podman generates random names like quirky_einstein which are fun but not helpful when you have multiple containers.
-p 9200:9200 maps port 9200 from the container to your host machine. This is the REST API port. All your queries go through here.
-p 9600:9600 maps the performance analyzer port. Optional for learning but useful for monitoring.
-e ‘discovery.type=single-node’ tells OpenSearch to run as a single node cluster. Without this, OpenSearch waits forever looking for other nodes to form a cluster and never fully starts.
-e ‘OPENSEARCH_INITIAL_ADMIN_PASSWORD=...’ sets the admin password. This is mandatory since OpenSearch 2.12. The password must meet strict requirements which I will explain shortly.
-e ‘OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m’ controls JVM heap size. 512 megabytes is enough for learning. Production needs more.
opensearchproject/opensearch:latest specifies the container image. The latest tag gives you the newest version.
Live Link: https://asciinema.org/a/772348
The Password Problem
OpenSearch 3.x introduced strict password validation. This is where tutorials written for older versions fail you.
Your password must have all of these.
At least 8 characters.
At least one uppercase letter.
At least one lowercase letter.
At least one digit.
At least one special character.
Passes the zxcvbn strength algorithm.
That last requirement trips up most people. The zxcvbn algorithm detects common patterns and rejects passwords that look strong but are actually predictable.
These passwords fail validation.
Admin123! fails because Admin is a common word and 123 is a sequential pattern.
Password1@ fails because Password is literally a dictionary word.
Secure#2024 fails because secure is a dictionary word.
MyStr0ng!Pass fails because it uses predictable letter to number substitutions.
These passwords work.
OpenSearch@2024Secure works because the combination is random enough.
BlueTiger#Lamp42 works because random word combinations are strong.
Kx9$mPq2#vL8nR works because it is truly random.
The Bash Escaping Problem
Even with a valid password, bash can break your command. The exclamation mark is the culprit.
Try this command.
podman run -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=MyStr0ng!Pass"
Bash returns event not found and refuses to run anything.
Why? Bash interprets ! as history expansion. When you type !Pass, bash tries to find a previous command starting with Pass and substitute it.
The fix is simple. Use single quotes instead of double quotes.
podman run -e 'OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpenSearch@2024Secure'
Single quotes tell bash to treat everything inside literally. No expansion. No interpretation. Just the text.
This applies to any password containing these characters.
! triggers history expansion.
$ triggers variable expansion.
` triggers command substitution.
# starts a comment if at certain positions.
Always use single quotes for passwords in shell commands.
Waiting for Startup
After running the podman command, OpenSearch needs time to initialize. This is not instant.
OpenSearch 3.x takes 60 to 90 seconds to fully start. The security plugin configures itself. Internal indices are created. The cluster state stabilizes.
Wait for it.
sleep 60
Then check container status.
podman ps
You should see output showing your container is up and running with ports mapped.
CONTAINER ID  IMAGE                                          STATUS        PORTS
85c47eeb262f  opensearchproject/opensearch:latest            Up 2 minutes  0.0.0.0:9200->9200/tcp
If the STATUS column shows Exited instead of Up, something went wrong. Check the logs.
podman logs opensearch-node
Testing the Connection
Now verify OpenSearch responds to requests.
curl -k -u 'admin:OpenSearch@2024Secure' https://localhost:9200
The flags explained.
-k tells curl to skip SSL certificate verification. OpenSearch uses self signed demo certificates. Curl would reject them without this flag.
-u ‘admin:password’ provides HTTP Basic authentication credentials.
https://localhost:9200
 is the endpoint. Note HTTPS not HTTP. OpenSearch enables SSL by default.
Expected response.
{
  "name" : "opensearch-node",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "abc123...",
  "version" : {
    "distribution" : "opensearch",
    "number" : "3.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}
Check cluster health.
curl -k -u 'admin:OpenSearch@2024Secure' https://localhost:9200/_cluster/health?pretty
Look for the status field. Green means healthy. Yellow means functional but replicas are unassigned. Red means problems.
Yellow is normal for single node setups. Replicas cannot be placed on the same node as primaries so they remain unassigned. This is expected behavior not an error.
Troubleshooting
Things will go wrong. Here is how to fix the common issues.
SSL_ERROR_SYSCALL when connecting
This error means OpenSearch is not ready yet or crashed during startup.
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:9200
Check if the container is running.
podman ps -a
If status shows Exited, check the logs.
podman logs opensearch-node
If status shows Up, wait longer. OpenSearch 3.x can take 90 seconds on slower machines.
sleep 30
curl -k -u 'admin:OpenSearch@2024Secure' https://localhost:9200
Container exits immediately
The container starts and stops within seconds. Check logs for the reason.
podman logs opensearch-node
Common causes.
Weak password. Look for “Weak password” in the logs. Choose a stronger password following the rules above.
Not enough memory. OpenSearch needs at least 2GB free RAM. Check with free -h and close other applications.
Port already in use. Another process is using port 9200. Check with ss -tlnp | grep 9200 and stop the conflicting service.
Permission denied errors
Podman runs rootless by default. Some systems need adjustments.
podman unshare chown -R 1000:1000 /path/to/data
Or run with the user namespace disabled for testing.
podman run --userns=keep-id ...
Adding Dashboards
OpenSearch Dashboards provides a web interface for visualization and management. Running it alongside OpenSearch requires a shared network.
Create a network.
podman network create opensearch-net
Start OpenSearch on this network.
podman run -d \
  --name opensearch-node \
  --network opensearch-net \
  -p 9200:9200 \
  -p 9600:9600 \
  -e 'discovery.type=single-node' \
  -e 'OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpenSearch@2024Secure' \
  -e 'OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m' \
  opensearchproject/opensearch:latest
Wait for OpenSearch to start.
sleep 60
Start Dashboards on the same network.
podman run -d \
  --name opensearch-dashboards \
  --network opensearch-net \
  -p 5601:5601 \
  -e 'OPENSEARCH_HOSTS=["https://opensearch-node:9200"]' \
  opensearchproject/opensearch-dashboards:latest
Wait for Dashboards to start.
sleep 30
Open http://localhost:5601 in your browser. Login with admin and your password.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.
Using Podman Compose
For repeatable setups, use Podman Compose. It reads the same YAML files as Docker Compose.
Install it.
pip install podman-compose
Create a directory for your project.
mkdir ~/opensearch-tutorial
cd ~/opensearch-tutorial
Create a file named podman-compose.yml with this content.
version: '3'
services:
  opensearch-node:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node
    environment:
      - discovery.type=single-node
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpenSearch@2024Secure
      - OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m
      - bootstrap.memory_lock=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    ports:
      - 9200:9200
      - 9600:9600
    networks:
      - opensearch-net

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    container_name: opensearch-dashboards
    environment:
      - OPENSEARCH_HOSTS=["https://opensearch-node:9200"]
    ports:
      - 5601:5601
    networks:
      - opensearch-net
    depends_on:
      - opensearch-node

networks:
  opensearch-net:
Start everything.
podman-compose up -d
Check status.
podman-compose ps
Stop everything.
podman-compose down
Performance Tuning
The defaults work for learning. Production needs tuning.
Heap size controls how much memory the JVM uses. Set it to 50% of available RAM but never more than 32GB. Always set minimum and maximum to the same value.
For 8GB RAM machine use 4GB heap.
-e 'OPENSEARCH_JAVA_OPTS=-Xms4g -Xmx4g'
For 16GB RAM machine use 8GB heap.
-e 'OPENSEARCH_JAVA_OPTS=-Xms8g -Xmx8g'
Memory lock prevents swapping. Swapping destroys search performance. Enable it.
-e 'bootstrap.memory_lock=true' \
--ulimit memlock=-1:-1
File descriptors limit how many files OpenSearch can open. The default is too low.
--ulimit nofile=65536:65536
Complete production ready command.
podman run -d \
  --name opensearch-node \
  -p 9200:9200 \
  -p 9600:9600 \
  -e 'discovery.type=single-node' \
  -e 'OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpenSearch@2024Secure' \
  -e 'OPENSEARCH_JAVA_OPTS=-Xms4g -Xmx4g' \
  -e 'bootstrap.memory_lock=true' \
  --ulimit memlock=-1:-1 \
  --ulimit nofile=65536:65536 \
  opensearchproject/opensearch:latest
Cleanup
When you are done experimenting.
Stop containers.
podman stop opensearch-node opensearch-dashboards
Remove containers.
podman rm opensearch-node opensearch-dashboards
Or with compose.
podman-compose down
Remove unused images to free disk space.
podman image prune
Remove unused volumes.
podman volume prune
Interactive guide available at https://opensearch.9cld.com



Fix Alibaba Cloud ECS Boot Problems Step by Step
Bhavesh Panchal — Fri, 23 Jan 2026 08:20:42 GMT
When an Alibaba Cloud ECS instance fails to boot, do not panic. This problem is very common. In most cases, the server is not broken. Only the system startup failed.
You may see the instance as running in the console. SSH does not connect. The VNC screen is empty or stuck. This usually means the operating system could not start.
The most important thing to know is this. Your data is usually safe. We can fix the problem by repairing the system disk.
Follow the steps below slowly and carefully.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.
Step 1: Stop the ECS instance
• Open the Alibaba Cloud console
• Stop the ECS instance
• Do not restart it again and again
• Wait until the status shows stopped
Step 2: Detach the system disk
• Open the disk section of the instance
• Find the system disk
• Detach the disk
• Do not delete the disk
Step 3: Create a rescue ECS
• Create a new ECS in the same region and zone
• Use a basic Linux image
• This server is only for fixing the disk
Step 4: Attach the broken disk
• Attach the system disk to the rescue ECS
• Attach it as a data disk
• Start the rescue ECS
• Log in using SSH
Step 5: Find the disk
Check the disks.
lsblk
You will see a new disk. It is often named vdb. This is the broken system disk.
Step 6: Mount the disk
Create a folder.
mkdir /mnt/rescue
Mount the disk.
mount /dev/vdb1 /mnt/rescue
If vdb1 does not work, check lsblk again and adjust the name.
Step 7: Check the fstab file
This file is a very common cause of boot failure.
• Open the file
• Look for disks or UUID values that do not exist
• Comment out the broken lines
cat /mnt/rescue/etc/fstab
vi /mnt/rescue/etc/fstab
If you are not sure about a line, comment it out and test later.
Step 8: Check boot files
Make sure the boot folder is not empty.
ls /mnt/rescue/boot
If this folder is empty or missing files, the system cannot boot.
Step 9: Fix the bootloader
Prepare the environment.
mount --bind /dev /mnt/rescue/dev
mount --bind /proc /mnt/rescue/proc
mount --bind /sys /mnt/rescue/sys
Enter the disk system.
chroot /mnt/rescue
Reinstall the bootloader.
grub-install /dev/vdb
grub-mkconfig -o /boot/grub/grub.cfg
Exit.
exit
Step 10: Check logs if needed
Logs can show what failed.
cat /mnt/rescue/var/log/boot.log
cat /mnt/rescue/var/log/messages
If you see clear errors, fix them before continuing.
Step 11: Unmount the disk
Unmount everything.
umount /mnt/rescue/dev
umount /mnt/rescue/proc
umount /mnt/rescue/sys
umount /mnt/rescue
Detach the disk from the rescue ECS.
Step 12: Boot the original ECS
• Attach the disk back as the system disk
• Start the ECS instance
• Try SSH again
If SSH works, the recovery is complete.
If it still does not work
• Go back and recheck fstab
• Check boot files again
• If needed, rebuild the system and copy data from the disk
Simple rules to avoid this problem
• Always take snapshots before changes
• Be very careful with fstab
• Do not rush disk or boot changes
• Test updates on a test server first
Alibaba Cloud ECS boot problems look scary, but they are usually easy to fix. If you stay calm and follow the steps, you can recover most servers without losing data.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.



Day 6: Reranking Search Results 
Ankit Mehta — Fri, 23 Jan 2026 04:10:25 GMT
You have built a semantic search. Your vectors are flowing. Documents are being retrieved.
But here is the uncomfortable truth, the best answer to your user’s question might be sitting at position #7 in your results. Or #15. Or buried somewhere in the middle where nobody will ever find it.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.
This is the gap that separates “working search” from “great search.”
The Restaurant Problem
Imagine you walk into a restaurant and ask the waiter, “What is good here?”
The waiter disappears into the kitchen and returns with 20 dishes. They are all technically from the menu. They all contain food. But some are appetizers when you wanted a main course. Some are vegetarian when you eat meat. Some are simply not what the chef would recommend for someone like you.
This is what standard search does. It retrieves documents that match your query. But “matching” is not the same as “best answer.”
Now imagine the head chef comes out. She looks at the 20 dishes, considers your preferences, and says: “For you, start with these five.”
That is reranking.
What Reranking Actually Does
Reranking is a two-stage retrieval technique. Here is how it works,
Stage 1 (Retrieval): A fast search method (keyword, vector, or hybrid) retrieves candidate documents. This stage prioritizes recall. It casts a wide net to ensure the best answer is somewhere in the results.
Stage 2 (Reranking): A more powerful model evaluates each candidate for relevance to the specific query. This stage prioritizes precision. It surfaces the truly relevant documents to the top.
The key insight is that these two stages have different strengths. Fast retrieval methods like BM25 or vector search can process millions of documents in milliseconds, but they make some mistakes. Cross-encoder reranking models are much more accurate, but they are too slow to run on your entire document collection.
Combine them, and you get the best of both worlds.
The Real Difference: Bi-Encoders vs Cross-Encoders
To understand why reranking works so well, you need to understand the difference between bi-encoders and cross-encoders.
Bi-Encoders (What You Use for Semantic Search)
A bi-encoder processes the query and each document separately. It creates an embedding for the query and an embedding for each document. Then it compares them using cosine similarity.
This is fast because you can pre-compute document embeddings. When a query comes in, you only need to compute one embedding and compare it against your stored vectors.
But here is the problem, by processing the query and document separately, the model loses context. It cannot see how specific words in the query relate to specific words in the document.
Cross-Encoders (What Rerankers Use)
A cross-encoder processes the query and document together as a single input. The model sees “[QUERY] What is the capital of the United States? [DOC] Washington D.C. is the capital of the United States.”
Because it processes both together, it can understand relationships and context that bi-encoders miss. It can recognize that “capital” in the query means “seat of government,” not “uppercase letter” or “financial assets.”
The tradeoff is speed. You cannot pre-compute anything. Every query requires running the model on every candidate document.
A Concrete Example
Let us say you search for: “What is the capital of the United States?”
Without Reranking (Standard Search Results):
#1: “Carson City is the capital city of the American state of Nevada.”
#2: “The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.”
#3: “Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States.”
#4: “Capital punishment (the death penalty) has existed in the United States since before the United States was a country.”
The correct answer is at position #3. The search engine got confused by documents that contain both “capital” and state names or “United States.”
With Reranking:
#1 (score: 0.98): “Washington, D.C... is the capital of the United States.”
#2 (score: 0.28): “Capital punishment has existed in the United States...”
#3 (score: 0.10): “Carson City is the capital city of Nevada.”
#4 (score: 0.07): “Northern Mariana Islands... capital is Saipan.”
The cross-encoder understood that the query is asking about a specific capital (the country’s), not just any capital, and not capital punishment. It pushed the correct answer to position #1 with 98% confidence.
Why This Matters for RAG
If you are building retrieval-augmented generation (RAG) applications, reranking is not optional. It is essential.
When you send documents to an LLM as context, you are paying for tokens. More importantly, LLMs have limited attention. If you send 10 documents and the relevant one is at position #7, the LLM might miss it or give it less weight.
Reranking ensures that when you send 3-5 documents to your LLM, they are the RIGHT 3-5 documents. This improves answer quality and reduces costs.
The Reranking Providers
OpenSearch supports multiple reranking approaches. Here is when to use each:
Cohere Rerank
Cohere offers one of the best reranking APIs available. It supports 100+ languages, handles structured data well, and integrates easily with OpenSearch through a connector.
Use Cohere when you want the best accuracy with minimal setup, and you are comfortable with external API calls. It is ideal for production systems where accuracy justifies the per-request cost.
Amazon Bedrock
Bedrock provides access to multiple reranking models, including Cohere Rerank, through AWS-native integration. Authentication uses IAM roles instead of API keys.
Use Bedrock when you are already in the AWS ecosystem and want native integration with IAM security. Regional availability varies, so check that your preferred model is available in your region.
Amazon SageMaker
SageMaker lets you deploy open-source reranking models like ms-marco-MiniLM or BGE-reranker-v2-m3 on your own infrastructure. You control the hardware, the model, and the data flow.
Use SageMaker when you have high query volumes (where per-request pricing becomes expensive), strict data residency requirements, or you want to use custom fine-tuned models.
Cross-Encoder (Local)
You can also deploy cross-encoder models directly within OpenSearch. This eliminates external API calls entirely but requires more infrastructure management.
Use local cross-encoders in air-gapped environments or when latency is absolutely critical.
Setting Up Reranking with Cohere
Here is the complete setup flow for Cohere Rerank in OpenSearch:
Step 1: Create the Connector
json
POST /_plugins/_ml/connectors/_create
{
  "name": "cohere-rerank",
  "description": "Connector to Cohere Rerank model",
  "version": "1",
  "protocol": "http",
  "credential": {
    "cohere_key": "your_cohere_api_key"
  },
  "parameters": {
    "model": "rerank-english-v3.0"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://api.cohere.ai/v1/rerank",
      "headers": {
        "Authorization": "Bearer ${credential.cohere_key}"
      },
      "request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"model\": \"${parameters.model}\", \"top_n\": ${parameters.top_n} }",
      "pre_process_function": "connector.pre_process.cohere.rerank",
      "post_process_function": "connector.post_process.cohere.rerank"
    }
  ]
}
Save the connector_id from the response.
Step 2: Register and Deploy the Model
json
POST /_plugins/_ml/models/_register?deploy=true
{
  "name": "cohere rerank model",
  "function_name": "remote",
  "description": "Cohere Rerank for search relevance",
  "connector_id": "your_connector_id"
}
Save the model_id from the response.
Step 3: Create a Search Pipeline
json
PUT /_search/pipeline/rerank_pipeline
{
  "description": "Pipeline for reranking with Cohere",
  "response_processors": [
    {
      "rerank": {
        "ml_opensearch": {
          "model_id": "your_model_id"
        },
        "context": {
          "document_fields": ["passage_text"]
        }
      }
    }
  ]
}
The document_fields parameter tells the reranker which fields to consider when scoring relevance. If you specify multiple fields, their values are concatenated before reranking.
Step 4: Search with Reranking
json
GET my-index/_search?search_pipeline=rerank_pipeline
{
  "query": {
    "match": {
      "passage_text": "What is the capital of the United States?"
    }
  },
  "size": 10,
  "ext": {
    "rerank": {
      "query_context": {
        "query_text": "What is the capital of the United States?"
      }
    }
  }
}
The ext.rerank.query_context.query_text is what the reranker uses to score documents. In most cases, this matches your search query, but it can be different if needed.
Reranking by Field
Sometimes you already have relevance scores in your documents. Maybe a previous ML model computed them during indexing, or you have user ratings, or business priority scores.
OpenSearch supports field-based reranking that reorders results by a document field without calling an external model:
json
PUT /_search/pipeline/rerank_by_stars
{
  "response_processors": [
    {
      "rerank": {
        "by_field": {
          "target_field": "reviews.stars",
          "keep_previous_score": true
        }
      }
    }
  ]
}
This is useful for scenarios where you want to combine search relevance with business logic, like boosting highly-rated products or recent content.
You can also chain ML inference with field-based reranking. First, an ML model writes a relevance score to each document, then the field-based reranker sorts by that score.
AWS vs Alibaba Cloud
Both AWS and Alibaba Cloud support reranking in their managed OpenSearch services, but the implementations differ:
AWS OpenSearch Service
Native Bedrock integration for Cohere Rerank and other models
SageMaker endpoints for custom rerankers
IAM role-based authentication with SigV4 signing
Strong regional availability in the Americas and Europe
Alibaba Cloud OpenSearch
Model Studio integration for Qwen-based reranking
PAI-EAS for custom model deployment
RAM role-based authentication
Strong regional availability in Asia-Pacific, especially China
The connector patterns are similar, but authentication differs. AWS uses IAM roles with SigV4 request signing. Alibaba uses RAM roles with AccessKey/SecretKey pairs.
Performance Considerations
Reranking adds latency to your search pipeline. A typical cross-encoder takes 10-50ms to score a batch of documents. Here is how to manage this:
Limit candidate documents: Only rerank the top 100-200 results from your first-stage retrieval. Reranking 1000 documents is unnecessarily expensive.
Set appropriate size: The size parameter controls how many documents you return to the user. The reranker scores all candidates, but you only return the top N.
Consider hybrid queries: If you use hybrid search (keyword + vector), reranking happens after the normalization processor. The combined scores are normalized first, then reranked.
Monitor costs: External APIs like Cohere charge per request. High-volume applications should consider SageMaker or local deployment.
What is Next
Tomorrow, we dive into RAG and conversational search. With reranking in your toolkit, you are ready to build AI applications that not only retrieve documents but also synthesize them into coherent answers.
Interactive Guide: opensearch.9cld.com/day/06-reranking
Day 6 of 60 - From Zero to OpenSearch Hero
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.



Day 5: Semantic Search - From Vectors to Meaning
Ankit Mehta — Wed, 21 Jan 2026 06:18:20 GMT
The Missing Piece from Day 4
Yesterday, we learned about vector search, the mechanism for finding similar vectors using k-NN algorithms. We indexed coordinates, computed distances, and retrieved nearest neighbors.
But there was a gap.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.
We talked about “embeddings” without explaining where they come from. We assumed vectors existed without showing how text becomes numbers. We skipped the most important question:
How does “portable computer for travel” become a vector that sits near “laptop” in mathematical space?
That is what Day 5 answers. Today, we connect the dots between raw text and the vectors that power semantic search.
What Is Semantic Search?
Semantic search understands meaning, not just matching tokens.
When you search “portable computer for travel,” semantic search knows you probably mean laptops. It returns MacBook Airs, Dell XPS 13s, and ThinkPad X1 Carbons, even though none of those documents contain the exact phrase “portable computer.”
The magic happens in the embedding model, a neural network trained on billions of text examples that learned to place similar meanings close together in vector space.
What is the difference between Vector Search and Semantic Search?
Before we go further, let me clarify something that confuses many people.
Vector Search is the underlying mechanism, the “how.” It finds documents by comparing numerical vectors using distance metrics like cosine similarity or Euclidean distance. Vector search does not care what the vectors represent. They could be image embeddings, user preference vectors, GPS coordinates, or random numbers.
Semantic Search is a specific application, the “what for.” It uses vector search to find documents based on meaning. The key ingredient is the embedding model that converts text into vectors, where similar meanings produce similar numbers.
Think of it this way:
Vector search = “Find me vectors closest to this vector.”
Semantic search = “Find me documents with similar meaning to this text” (which uses vector search under the hood)
The relationship:
Vector Search (mechanism)
    ├── Semantic Search (text application)
    ├── Image Search (image application)  
    ├── Recommendation Systems (user/item application)
    └── Any similarity-based retrieval
Semantic search IS vector search, but vector search is not always semantic search.
When to use which terminology:
Say “vector search” when working with pre-computed vectors, images, recommendations, or any non-text similarity matching
Say “semantic search” when the goal is matching text by meaning using embedding models
Day 4 covered vector search fundamentals (k-NN, HNSW, bringing your own vectors)
Day 5 (this post) covers semantic search - the text-meaning application
How Semantic Search Works
Here is how it works at a high level.
Step 1: Text becomes numbers. An embedding model converts text into a dense vector, a list of floating-point numbers (typically 384 to 3072 values) that captures the semantic meaning. Similar meanings produce similar numbers.
Step 2: Store in a vector index. OpenSearch stores these vectors in a k-NN index. During ingestion, a pipeline automatically generates embeddings from your text fields.
Step 3: Search by meaning. Your query is also converted to a vector. OpenSearch finds documents with vectors closest to your query vector using approximate nearest neighbor algorithms.
The key insight: “portable computer” and “laptop” end up close together in vector space because the embedding model learned they mean similar things from training on billions of text examples.
Keyword vs Semantic: When to Use Which
This is not an either/or decision. Both have their place.
Keyword search (BM25) excels at:
Product SKUs and exact identifiers
Names, titles, and proper nouns
Technical terms users type precisely
Cases where an exact match is required
Semantic search excels at:
Natural language questions
Conceptual queries (”something for headaches”)
FAQ and knowledge base matching
RAG applications feeding LLMs
Cross-language understanding
Hybrid search combines both. You run keyword and semantic search in parallel, normalize the scores, and merge the results. This gives you exact matches when they exist, and semantically similar results when keywords don't match.
Most production systems use hybrid search.
The Embedding Provider Decision
Before you can do semantic search, you need an embedding model. OpenSearch connects to external models through connectors. Your options:
OpenAI - The industry standard. Their text-embedding-ada-002 produces 1536-dimensional vectors. Easy to set up, well-documented, but your data leaves your network, and you pay per token.
Cohere - Strong multilingual support and input type optimization (different handling for queries vs documents). Available directly via API or through Amazon Bedrock. Supports byte-quantized vectors for memory efficiency.
Amazon Bedrock Titan - Runs entirely within AWS. Your data never leaves the AWS network. Uses IAM for authentication instead of API keys. Ideal for regulated industries and compliance requirements.
Amazon SageMaker - Deploy any model you want. Use Hugging Face sentence transformers, fine-tuned models, or custom architectures. Maximum flexibility, but you manage the infrastructure.
Local Models - Run sentence-transformer models directly on OpenSearch ML nodes. Zero external dependencies, complete data privacy. Best for air-gapped environments, but uses cluster resources and is limited to smaller models.
Here is a quick decision guide:
Need multilingual? Cohere
Data cannot leave AWS? Bedrock Titan
Custom/fine-tuned model? SageMaker
Air-gapped environment? Local Models
Quick prototype? OpenAI
Setting Up Semantic Search: The Complete Flow
Let me walk through the entire setup using Amazon Bedrock Titan as the embedding provider. The same pattern applies to other providers with minor connector changes.
1. Create the Connector
A connector tells OpenSearch how to communicate with the embedding model. For Bedrock, you need an IAM role that allows the connector to invoke the model.
json
POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector: titan embedding v1",
  "description": "Connector to Bedrock Titan embedding model",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock"
  },
  "credential": {
    "roleArn": "arn:aws:iam::YOUR_ACCOUNT:role/bedrock-invoke-role"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
      "headers": {
        "content-type": "application/json",
        "x-amz-content-sha256": "required"
      },
      "request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
      "pre_process_function": "connector.pre_process.bedrock.embedding",
      "post_process_function": "connector.post_process.bedrock.embedding"
    }
  ]
}
Save the connector_id from the response.
2. Register and Deploy the Model
json
POST /_plugins/_ml/models/_register
{
  "name": "bedrock titan embedding model v1",
  "function_name": "remote",
  "description": "Bedrock Titan text embedding model",
  "connector_id": "YOUR_CONNECTOR_ID"
}
Then deploy it:
json
POST /_plugins/_ml/models/YOUR_MODEL_ID/_deploy
Test that it works:
json
POST /_plugins/_ml/models/YOUR_MODEL_ID/_predict
{
  "parameters": {
    "inputText": "hello world"
  }
}
You should get back a vector with 1536 floating-point numbers.
3. Create the Ingest Pipeline
The ingest pipeline automatically generates embeddings when documents are indexed:
json
PUT /_ingest/pipeline/my_embedding_pipeline
{
  "description": "Text embedding pipeline for semantic search",
  "processors": [
    {
      "text_embedding": {
        "model_id": "YOUR_MODEL_ID",
        "field_map": {
          "text": "text_embedding"
        }
      }
    }
  ]
}
This says: take the text field, run it through the model, store the result in text_embedding.
4. Create the Vector Index
json
PUT /semantic-search-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.space_type": "cosinesimil",
      "default_pipeline": "my_embedding_pipeline"
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "text_embedding": {
        "type": "knn_vector",
        "dimension": 1536
      }
    }
  }
}
The dimension must match your embedding model output. Titan produces 1536. Cohere produces 1024. Local models vary.
5. Index and Search
Index a document (embeddings are generated automatically):
json
POST /semantic-search-index/_doc
{
  "text": "OpenSearch is a distributed search and analytics engine"
}
Run a semantic search:
json
GET /semantic-search-index/_search
{
  "_source": {
    "excludes": ["text_embedding"]
  },
  "query": {
    "neural": {
      "text_embedding": {
        "query_text": "distributed database for logs",
        "model_id": "YOUR_MODEL_ID",
        "k": 10
      }
    }
  }
}
Notice how the query “distributed database for logs” can match the document about “distributed search and analytics engine” even though the exact words differ. The embedding model understands the semantic similarity.
Asymmetric Search: Queries Are Not Documents
Here is something most tutorials skip.
In real search, queries and documents are fundamentally different. A query is short: “best laptop for coding.” A document is long: a 500-word product review.
Symmetric models treat both the same way. They work okay when query and document are similar length. They struggle when a 5-word query must match a 500-word document.
Asymmetric models use different encoding strategies. Queries get expanded to capture intent. Documents get compressed to capture essence.
Models that support asymmetric search include E5 models (use “query:” and “passage:” prefixes), Cohere Embed v3 (use input_type: “search_query” vs “search_document”), and BGE models.
To implement asymmetric search in OpenSearch, you need separate pipelines for ingestion and search:
Ingest pipeline (for documents):
json
PUT /_ingest/pipeline/asymmetric_embedding_ingest_pipeline
{
  "processors": [
    {
      "ml_inference": {
        "model_id": "YOUR_MODEL_ID",
        "input_map": [
          { "inputText": "passage: ${description}" }
        ],
        "output_map": [
          { "fact_embedding": "$.embedding" }
        ]
      }
    }
  ]
}
Search pipeline (for queries):
json
PUT /_search/pipeline/asymmetric_embedding_search_pipeline
{
  "request_processors": [
    {
      "ml_inference": {
        "model_id": "YOUR_MODEL_ID",
        "input_map": [
          { "inputText": "query: ${ext.ml_inference.params.query}" }
        ],
        "output_map": [
          { "ext.ml_inference.params.vector": "$.embedding" }
        ]
      }
    }
  ]
}
The “passage:” and “query:” prefixes tell the model to encode the text differently based on its purpose.
Handling Long Documents: Text Chunking
Most embedding models have token limits. Titan V2 supports 8,192 tokens maximum. A 50,000-word technical manual will not fit.
Even if truncation worked, a single embedding for a long document loses detail. The first section dominates the vector; everything else is barely represented.
Text chunking splits documents into searchable passages. Each chunk gets its own embedding. When you search, you find the best-matching chunk, and return its parent document.
Here is a chunking pipeline:
json
PUT _ingest/pipeline/chunking-embedding-pipeline
{
  "processors": [
    {
      "text_chunking": {
        "algorithm": {
          "fixed_token_length": {
            "token_limit": 100,
            "overlap_rate": 0.2,
            "tokenizer": "standard"
          }
        },
        "field_map": {
          "passage_text": "passage_chunk"
        }
      }
    },
    {
      "foreach": {
        "field": "passage_chunk",
        "processor": {
          "ml_inference": {
            "model_id": "YOUR_MODEL_ID",
            "input_map": [
              { "inputText": "_ingest._value.text" }
            ],
            "output_map": [
              { "_ingest._value.embedding": "embedding" }
            ]
          }
        }
      }
    }
  ]
}
Chunks are stored as nested objects. Each document has multiple chunk embeddings. When searching, use a nested query with score_mode: "max" so the parent document is scored by its best-matching chunk.
Semantic Highlighting
Traditional highlighting finds keyword matches and wraps them in tags. Semantic highlighting uses ML models to find the most relevant sentences - even when exact keywords are absent.
Query: “heart treatment”
Traditional highlighting: “...heart disease requires treatment...”
Semantic highlighting: “Cardiovascular therapy options include medication and surgical procedures...”
The second result found a semantically relevant sentence even though the words “heart” and “treatment” do not appear.
To use semantic highlighting, your field must be of type semantic and you need a sentence highlighting model deployed separately from your embedding model:
json
GET /my-index/_search
{
  "query": {
    "neural": {
      "text_embedding": {
        "query_text": "treatments for neurodegenerative diseases",
        "model_id": "YOUR_EMBEDDING_MODEL_ID",
        "k": 5
      }
    }
  },
  "highlight": {
    "fields": {
      "text": {
        "type": "semantic",
        "number_of_fragments": 2
      }
    },
    "options": {
      "model_id": "YOUR_SENTENCE_HIGHLIGHTING_MODEL_ID"
    }
  }
}
AWS vs Alibaba Cloud: Semantic Search Comparison
Both clouds offer managed OpenSearch with semantic search capabilities. The approaches differ.
Amazon OpenSearch Service integrates with Bedrock (Titan, Cohere, Claude), SageMaker (any model), and external APIs. Uses IAM roles for authentication. CloudFormation templates automate the setup. Strong in Americas and Europe.
Alibaba Cloud OpenSearch integrates with Model Studio (Qwen, custom models) and PAI-EAS for self-hosted models. Uses RAM roles instead of IAM. Resource Orchestration Service (ROS) for automation. Strong in APAC and China.
Key differences:
Authentication: AWS uses IAM roles; Alibaba uses RAM roles
Foundation models: AWS has Bedrock; Alibaba has Model Studio
Automation: AWS CloudFormation; Alibaba ROS templates
Regional strength: AWS better in Americas/Europe; Alibaba better in APAC
Choose based on where your users are, what ecosystem you are already using, and which foundation models matter to you.
What We Covered
Day 5 was dense. Here is the recap:
Vector vs Semantic search - Vector search is the mechanism (comparing vectors). Semantic search is a text application that uses vector search with embedding models to match meaning.
Semantic search matches meaning, not keywords. It uses embedding models to convert text to vectors and k-NN search to find similar documents.
Multiple embedding providers exist - OpenAI, Cohere, Bedrock Titan, SageMaker, local models. Choose based on privacy requirements, cost, and infrastructure.
The setup flow: Create connector → Register model → Create ingest pipeline → Create vector index → Index documents → Search.
Asymmetric search handles the mismatch between short queries and long documents by encoding them differently.
Text chunking splits long documents into searchable passages, each with its own embedding.
Semantic highlighting finds relevant passages using meaning, not keyword matching.
AWS and Alibaba Cloud both support semantic search with different foundation model and authentication approaches.
Tomorrow we go deeper into hybrid search - combining keyword and semantic approaches for the best of both worlds.
Interactive guide: https://opensearch.9cld.com/day/05-semantic-search
All guides: 
https://opensearch.9cld.com/
Building in public. Learning out loud.
Thanks for reading 9Cloud! Subscribe for free to receive new posts and support my work.



Day 3: Your First Conversation with OpenSearch
Ankit Mehta — Wed, 14 Jan 2026 05:08:17 GMT
Yesterday, we got OpenSearch running in Docker. Today, we learn to actually talk to it.
Why This Matters
Here’s what most tutorials get wrong, they throw API commands at you without explaining the conversation happening under the hood.
OpenSearch isn’t magic. It is a service that speaks REST. Every operation,  indexing a document, running a search, checking cluster health, is just an HTTP request.
Today you will understand:
How to communicate with OpenSearch (the mechanics)
Why each method exists (the reasoning)
What happens internally (the architecture)
By the end, you won’t just know what commands to run. You will understand why they work.
The REST API: Your Language to OpenSearch
OpenSearch speaks HTTP. That’s it. No proprietary protocol, no special client required.
Every operation follows this pattern:
HTTP_METHOD host:port/index/operation
The Five HTTP Methods
GET - Retrieve information
GET /_cluster/health          # Cluster status
GET /products/_search         # Search documents
GET /products/_doc/123        # Get specific document
PUT - Create or replace (idempotent)
PUT /products                 # Create index
PUT /products/_doc/123        # Create/replace document with ID
POST - Create or update (not idempotent)
POST /products/_doc           # Create document (auto-generate ID)
POST /products/_search        # Search (yes, POST for complex queries)
DELETE - Remove resources
DELETE /products/_doc/123     # Delete document
DELETE /products              # Delete entire index
HEAD - Check existence (no body returned)
HEAD /products                # Does index exist?
The Anatomy of a Request
Let’s dissect a real request:
bash
curl -X POST "https://localhost:9200/products/_doc" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword \
  -k -d '{
    "name": "Wireless Headphones",
    "price": 149.99
  }'
Breaking it down:
-X POST - HTTP method (create)
localhost:9200 - Your cluster endpoint
/products - Target index name
/_doc - Document API endpoint
-H "Content-Type: application/json" - We’re sending JSON
-u admin:YourPassword - Basic auth credentials
-k - Skip SSL verification (dev only!)
-d '{...}' - The JSON document body
Step 1: Check Cluster Health
Before doing anything, verify your cluster is healthy:
curl -X GET "https://localhost:9200/_cluster/health?pretty" \
  -u admin:YourPassword -k
Response:
{
  "cluster_name": "opensearch-cluster",
  "status": "green",
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 5,
  "active_shards": 5
}
Understanding Status Colors
green - All shards assigned. You’re good!
yellow - Primary shards OK, some replicas unassigned. Normal for a single node.
red - Some primary shards are missing. Data unavailable, investigate!
Why yellow is OK for development: With one node, OpenSearch can’t place replica shards on a different node. It is smart enough to not put the backup on the same machine as the original.
Understanding “Tables” in OpenSearch
If you are coming from a relational database background, you might be wondering: “Where are the tables?”
OpenSearch doesn’t have tables. Instead, it has indexes.
Database vs OpenSearch terminology:
Database → Cluster
Table → Index
Row → Document
Column → Field
Schema → Mapping
So when someone says “create a table,” in OpenSearch we say “create an index.”
The key difference? 
In a database, you query rows. 
In OpenSearch, you search documents. 
The entire system is optimized for finding things fast, not for transactions or joins.
Step 2: Create an Index
An index is where your documents live. Think of it as a database table, but optimized for search.
Basic Index Creation
The simplest way to create an index:
curl -X PUT "https://localhost:9200/products" \
  -u admin:YourPassword -k
That’s it. OpenSearch will create an index with default settings. But you will almost always want to define settings and mappings upfront.
Index with Settings and Mappings
curl -X PUT "https://localhost:9200/products" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword -k -d '{
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    },
    "mappings": {
      "properties": {
        "name": { "type": "text" },
        "price": { "type": "float" },
        "category": { "type": "keyword" },
        "description": { "type": "text" },
        "in_stock": { "type": "boolean" },
        "created_at": { "type": "date" }
      }
    }
  }'
Verify Your Index Was Created
curl -X GET "https://localhost:9200/products?pretty" \
  -u admin:YourPassword -k
List All Indexes
curl -X GET "https://localhost:9200/_cat/indices?v" \
  -u admin:YourPassword -k
Response:
health status index    uuid                   pri rep docs.count store.size
green  open   products abc123xyz...           1   1   0          230b
Delete an Index
curl -X DELETE "https://localhost:9200/products" \
  -u admin:YourPassword -k
Warning: This deletes all data in the index. There’s no “are you sure?” prompt.
Understanding the Settings
number_of_shards: 1
How many pieces to split the index into
More shards = better parallelism for large datasets
Can’t change after creation!
Rule of thumb: 1 shard can handle ~30GB
number_of_replicas: 1
Copies of each shard for redundancy
More replicas = better read throughput + fault tolerance
Can change anytime
Understanding the Mapping
The mapping defines how each field is stored and indexed:
text - For full-text content. Analyzed word-by-word for search.
keyword - For exact values like categories or IDs. Exact match only.
float - For decimal numbers. Supports range queries.
boolean - For true/false values. Used in filter queries.
date - For timestamps. Supports date math queries.
Critical distinction:
text fields are analyzed - “Wireless Headphones” becomes tokens [”wireless”, “headphones”]
keyword fields are exact - “electronics” stays “electronics”
Step 3: Index Documents
Single Document
bash
curl -X POST "https://localhost:9200/products/_doc" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword -k -d '{
    "name": "Wireless Headphones",
    "price": 149.99,
    "category": "electronics",
    "description": "Premium noise-canceling wireless headphones with 30-hour battery",
    "in_stock": true,
    "created_at": "2025-01-14"
  }'
Response:
{
  "_index": "products",
  "_id": "abc123xyz",    // Auto-generated
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1
  }
}
Bulk Ingestion (10-100x Faster)
For multiple documents, use the _bulk API:
curl -X POST "https://localhost:9200/_bulk" \
  -H "Content-Type: application/x-ndjson" \
  -u admin:YourPassword -k -d '
{"index": {"_index": "products"}}
{"name": "Laptop Stand", "price": 49.99, "category": "accessories"}
{"index": {"_index": "products"}}
{"name": "USB-C Hub", "price": 79.99, "category": "accessories"}
{"index": {"_index": "products"}}
{"name": "Mechanical Keyboard", "price": 159.99, "category": "electronics"}
'
Why bulk is faster:
Single network round-trip instead of N
Batches writes to Lucene
Reduces per-request overhead
Best practices:
Batch 1,000-5,000 documents per request
Don’t exceed ~100MB per request
Use newline-delimited JSON (note the trailing newline!)
Step 4: Search Your Data
Now the fun part - actually searching.
Basic Match Query
curl -X GET "https://localhost:9200/products/_search" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword -k -d '{
    "query": {
      "match": {
        "description": "wireless headphones"
      }
    }
  }'
This searches the description field for documents containing “wireless” OR “headphones” (or both).
Understanding the Response
{
  "took": 5,                          // Query time in milliseconds
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"                // "eq" = exact, "gte" = at least
    },
    "max_score": 2.8567,              // Highest relevance score
    "hits": [
      {
        "_index": "products",
        "_id": "abc123",
        "_score": 2.8567,             // This document's relevance
        "_source": {                   // The actual document
          "name": "Wireless Headphones",
          "price": 149.99,
          "description": "Premium noise-canceling wireless headphones"
        }
      }
    ]
  }
}
Query Types Cheat Sheet
Match Query - Full-text search with analysis
{"query": {"match": {"description": "wireless bluetooth"}}}
Use for: Product descriptions, article content, user-generated text
Term Query - Exact value matching (no analysis)
{"query": {"term": {"category": "electronics"}}}
Use for: Categories, status fields, IDs
Range Query - Numeric/date ranges
{"query": {"range": {"price": {"gte": 50, "lte": 200}}}}
Use for: Price filters, date ranges
Bool Query - Combine multiple conditions
{
  "query": {
    "bool": {
      "must": [
        {"match": {"description": "wireless"}}
      ],
      "filter": [
        {"term": {"category": "electronics"}},
        {"range": {"price": {"lte": 200}}}
      ]
    }
  }
}
Important: Use filter for conditions that don’t affect relevance score - they’re cached and much faster!
Key Concepts Explained
What is a Cluster?
A cluster is a collection of nodes working together. It:
Distributes data across nodes automatically
Provides fault tolerance (if one node fails, others continue)
Scales horizontally by adding more nodes
What is a Node?
A node is a single OpenSearch server instance. Types:
Cluster Manager: Coordinates the cluster
Data Node: Stores data, executes searches
Ingest Node: Pre-processes documents
Coordinating Node: Routes requests (all nodes can do this)
What is an Index?
An index is a collection of documents with similar characteristics. It:
Has a mapping defining field types
Is split into shards for distribution
Can have aliases for easier management
What is a Document?
A document is a JSON object - the basic unit of information:
{
  "name": "Wireless Headphones",
  "price": 149.99,
  "_id": "abc123"      // Metadata
}
What is a Shard?
A shard is a subset of an index:
Primary shard: Original data
Replica shard: Copy for redundancy
Shards enable:
Parallel processing across nodes
High availability (replicas on different nodes)
Horizontal scaling
What is a Mapping?
A mapping defines how fields are stored and indexed:
{
  "properties": {
    "name": {"type": "text"},        // Analyzed for full-text search
    "category": {"type": "keyword"}  // Not analyzed, exact match
  }
}
Public Datasets to Practice With
Want to practice with real data? Here are excellent options:
1. OpenSearch Sample Data (Easiest)
Built into OpenSearch Dashboards. Go to Home → Add sample data.
eCommerce orders
Flight data
Web logs
2. Kaggle E-commerce Dataset
500K+ retail transactions. Perfect for:
Product search
Customer analytics
Aggregations practice
3. Wikipedia Dumps
Full-text articles in various sizes. Perfect for:
Large-scale indexing
Full-text search testing
NLP experiments
4. NYC Taxi Data
Trip records with geo coordinates. Perfect for:
Geo-spatial queries
Time-series analysis
Dashboard building
Cloud Considerations
AWS OpenSearch Service
When you create a domain in AWS:
Endpoint: 
https://your-domain.region.es.amazonaws.com
Auth: IAM roles or fine-grained access control
VPC: Can be in VPC or public (with IP restrictions)
# With IAM auth (need AWS CLI configured)
aws opensearch-serverless query \
  --endpoint https://your-domain.region.es.amazonaws.com \
  --query '{"query": {"match_all": {}}}'
Alibaba Cloud OpenSearch
Similar managed service with:
Auto-scaling capabilities
Built-in machine learning features
Different pricing model (pay per query unit)
Today’s Mini-Project: Product Catalog Search
Create a products index with proper mapping
Bulk index at least 10 products
Implement these searches:
Full-text search on product names/descriptions
Filter by category (keyword)
Price range filter
Combined search with bool query
Interactive Visual Guide: opensearch.9cld.com/day/03-first-conversation
What’s Next (Day 4)
Tomorrow we’ll dive into mappings and analyzers - the secret sauce behind great search. You’ll learn:
Why does the same search return different results with different mappings
How analyzers transform text into searchable tokens
Building custom analyzers for your use case
Subscribe now



Day 2: Running OpenSearch Locally
Ankit Mehta — Sun, 11 Jan 2026 21:40:16 GMT
The Problem Nobody Warns You About
You found a Docker command online. You ran it. OpenSearch started. Victory?
Not quite.
Three days later, you’re debugging why your cluster won’t accept connections, why security is disabled (or worse, why it’s enabled and you can’t authenticate), and why your laptop sounds like a jet engine.
The gap between “OpenSearch is running” and “OpenSearch is running correctly” is where most beginners get stuck.
Today, we bridge that gap.
What We are Building
By the end of this guide, you will have:
A single-node OpenSearch 3.0.0 cluster running in Docker
A multi-container setup with Docker Compose (OpenSearch + Dashboards)
Deep understanding of security parameters (and when to disable them safely)
Performance tuning that won’t melt your laptop
A reusable configuration you can version control
Let’s start from first principles.
Part 1: Docker Basics for OpenSearch
Why Docker?
Before we run commands, let’s understand why Docker matters for OpenSearch:
Without Docker:
Install Java, set JAVA_HOME
Download, extract, configure manually
Manage multiple versions manually
“Works on my machine” problems
Cleanup is messy
With Docker:
One command to run
Switch versions with a tag
Identical everywhere
docker rm and done
Docker gives you isolation and reproducibility. Your OpenSearch environment is the same whether you’re on macOS, Windows, or Linux.
The Simplest Possible Command
Here’s the absolute minimum to get OpenSearch 3.0.0 running:
docker run -d \
  --name opensearch-node \
  -p 9200:9200 \
  -p 9600:9600 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=MyStr0ng#Pass!" \
  opensearchproject/opensearch:latest
Let’s decode every piece:
-d Run in background (detached mode)
--name opensearch-node Give the container a memorable name
-p 9200:9200 Expose REST API port (host:container)
-p 9600:9600 Expose Performance Analyzer port
discovery.type=single-node Tell OpenSearch this is a solo node, not a cluster
OPENSEARCH_INITIAL_ADMIN_PASSWORD Set the admin password (required since OpenSearch 2.12)
Verify It’s Working
# Check container is running
docker ps

# Check OpenSearch is responding
curl -k -u admin:MyStr0ng#Pass! https://localhost:9200
The -k flag tells curl to accept the self-signed SSL certificate.
Expected response:
{
  "name" : "opensearch-node",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "...",
  "version" : {
    "distribution" : "opensearch",
    "number" : "3.4.0",
    ...
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}
Congratulations — OpenSearch 3.4.0 is running.
But we are just getting started.
Part 2: Understanding Security Parameters
OpenSearch’s security plugin is powerful but can be confusing for beginners. Let’s demystify it.
The Security Plugin: What It Does
The security plugin handles:
Authentication: Who are you? (username/password, certificates, SAML, OIDC)
Authorization: What can you do? (roles, permissions, tenants)
Encryption: TLS/SSL for data in transit
Audit logging: Who did what, when?
By default, security is enabled. This is good for production. For local development, it can be friction.
Option A: Keep Security Enabled (Recommended)
For local development with security:
docker run -d \
  --name opensearch-secure \
  -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=Dev#Pass123!" \
  opensearchproject/opensearch:latest
Password Requirements (OpenSearch 2.12+):
Minimum 8 characters
At least one uppercase letter
At least one lowercase letter
At least one digit
At least one special character
Connecting with security enabled:
# With curl
curl -k -u admin:Dev#Pass123! https://localhost:9200

# From Python
from opensearchpy import OpenSearch

client = OpenSearch(
    hosts=[{'host': 'localhost', 'port': 9200}],
    http_auth=('admin', 'Dev#Pass123!'),
    use_ssl=True,
    verify_certs=False  # For self-signed certs
)
Option B: Disable Security (Development Only)
For rapid prototyping, where you need zero friction:
docker run -d \
  --name opensearch-insecure \
  -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "DISABLE_SECURITY_PLUGIN=true" \
  opensearchproject/opensearch:latest
Connecting without security:
# Simple curl - no auth, no SSL
curl http://localhost:9200

# From Python
client = OpenSearch(
    hosts=[{'host': 'localhost', 'port': 9200}],
    use_ssl=False
)
When to Disable Security
OK to Disable:
Quick local experiments
Learning and tutorials
Throwaway demos
CI/CD pipeline tests
Keep Security ON:
Any shared environment
Development servers others access
Pre-production testing
Anything with real data
The Rule: If anyone other than you can access it, security stays ON.
Security Environment Variables Reference
OPENSEARCH_INITIAL_ADMIN_PASSWORD Sets admin password on first run. Required if security is enabled.
DISABLE_SECURITY_PLUGIN Turns off security entirely. Default is false.
DISABLE_INSTALL_DEMO_CONFIG Skip demo certificates. Default is false.
plugins.security.ssl.http.enabled Enable HTTPS. Default is true.
plugins.security.disabled Another way to disable security. Default is false.
Part 3: Performance Parameters That Actually Matter
Running OpenSearch on a laptop is different from running it in production. Here’s how to tune it for local development.
The JVM Heap: Most Important Setting
OpenSearch runs on the JVM. The heap size determines how much memory it can use.
The Rule: Set heap to 50% of available RAM, but never more than 32GB (due to compressed pointers).
For local development on a 16GB laptop:
docker run -d \
  --name opensearch-tuned \
  -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=Dev#Pass123!" \
  -e "OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g" \
  opensearchproject/opensearch:latest
Recommended heap sizes by laptop RAM:
8GB RAM Recommended heap: 1-2GB Setting: -Xms1g -Xmx1g
16GB RAM Recommended heap: 2-4GB Setting: -Xms2g -Xmx2g
32GB RAM Recommended heap: 4-8GB Setting: -Xms4g -Xmx4g
Critical: Always set -Xms and -Xmx to the same value. This prevents expensive heap resizing at runtime.
Memory Lock: Prevent Swapping
Swapping is death for search performance. Lock the heap in RAM:
docker run -d \
  --name opensearch-locked \
  -p 9200:9200 \
  --ulimit memlock=-1:-1 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=Dev#Pass123!" \
  -e "bootstrap.memory_lock=true" \
  opensearchproject/opensearch:latest
Virtual Memory: The mmap Issue
OpenSearch uses memory-mapped files. Linux has a default limit that’s too low.
On your host machine (not in Docker), run:
# Temporary (until reboot)
sudo sysctl -w vm.max_map_count=262144

# Permanent (add to /etc/sysctl.conf)
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
On macOS: Docker Desktop handles this automatically.
On Windows with WSL2: Run in WSL terminal:
wsl -d docker-desktop
sysctl -w vm.max_map_count=262144
Performance Environment Variables Reference
OPENSEARCH_JAVA_OPTS JVM heap settings. Recommended local value: -Xms2g -Xmx2g
bootstrap.memory_lock Prevent swapping. Recommended: true
cluster.routing.allocation.disk.threshold_enabled Disk watermarks. Default: true
indices.memory.index_buffer_size Indexing buffer. Default: 10%
Part 4: Docker Compose — The Right Way
Running multiple containers manually is tedious. Docker Compose defines your entire stack in one file.
Basic docker-compose.yml
version: '3.8'

services:
  opensearch:
    image: opensearchproject/opensearch:latest
    container_name: opensearch
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=Dev#Pass123!
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data:/usr/share/opensearch/data
    ports:
      - "9200:9200"
      - "9600:9600"
    networks:
      - opensearch-net

  dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    container_name: opensearch-dashboards
    environment:
      - OPENSEARCH_HOSTS=["https://opensearch:9200"]
    ports:
      - "5601:5601"
    networks:
      - opensearch-net
    depends_on:
      - opensearch

volumes:
  opensearch-data:

networks:
  opensearch-net:
What Each Section Does
services Defines each container — opensearch is the search engine, dashboards is the web UI (like Kibana)
environment Configuration via environment variables
ulimits System limits for memory lock and file descriptors
volumes Persist data across container restarts — without this, you lose all data when the container stops
ports Expose to host machine
networks Containers on the same network can talk to each other by name
depends_on Start order — dashboards waits for opensearch
Running Docker Compose
# Start everything
docker compose up -d

# View logs
docker compose logs -f

# Stop everything
docker compose down

# Stop and delete data
docker compose down -v
Development vs Production Compose Files
For development without security:
# docker-compose.dev.yml
version: '3.8'

services:
  opensearch:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-dev
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g
      - DISABLE_SECURITY_PLUGIN=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - opensearch-data:/usr/share/opensearch/data
    ports:
      - "9200:9200"
    networks:
      - opensearch-net

  dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    container_name: dashboards-dev
    environment:
      - OPENSEARCH_HOSTS=["http://opensearch:9200"]
      - DISABLE_SECURITY_DASHBOARDS_PLUGIN=true
    ports:
      - "5601:5601"
    networks:
      - opensearch-net
    depends_on:
      - opensearch

volumes:
  opensearch-data:

networks:
  opensearch-net:
Run with:
docker compose -f docker-compose.dev.yml up -d
Part 5: Multi-Node Cluster (Bonus)
For testing cluster behavior locally:
# docker-compose.cluster.yml
version: '3.8'

services:
  opensearch-node1:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=Cluster#Pass123!
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - "9200:9200"
      - "9600:9600"
    networks:
      - opensearch-net

  opensearch-node2:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node2
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=Cluster#Pass123!
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net

  dashboards:
    image: opensearchproject/opensearch-dashboards:3.0.0
    container_name: opensearch-dashboards
    environment:
      - OPENSEARCH_HOSTS=["https://opensearch-node1:9200","https://opensearch-node2:9200"]
    ports:
      - "5601:5601"
    networks:
      - opensearch-net
    depends_on:
      - opensearch-node1
      - opensearch-node2

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:
Key differences from single-node:
Single Node: Uses discovery.type=single-node, no cluster manager config, one volume, 2GB heap minimum
Multi-Node Cluster: Uses discovery.seed_hosts=..., requires cluster.initial_cluster_manager_nodes=..., volume per node, 1GB per node minimum
Part 6: Troubleshooting Common Issues
“OpenSearch didn’t start”
# Check container logs
docker logs opensearch-node

# Common fixes:
# 1. vm.max_map_count too low
sudo sysctl -w vm.max_map_count=262144

# 2. Not enough memory
# Reduce heap: -Xms512m -Xmx512m

# 3. Port already in use
docker ps -a  # Find conflicting container
lsof -i :9200  # Find process using port
“Connection refused”
# Is the container running?
docker ps

# Is OpenSearch ready? (check for "green" or "yellow")
curl -k -u admin:Dev#Pass123! https://localhost:9200/_cluster/health

# Check if port is exposed
docker port opensearch-node
“Authentication failed”
# Did you set the password correctly?
# Password must meet complexity requirements

# For fresh start (deletes all data):
docker rm -f opensearch-node
docker volume rm opensearch-data
# Then re-run docker run command
“Dashboards can’t connect to OpenSearch”
# Are they on the same network?
docker network inspect opensearch-net

# Is the URL correct?
# Use container name, not localhost:
# ✅ OPENSEARCH_HOSTS=["https://opensearch:9200"]
# ❌ OPENSEARCH_HOSTS=["https://localhost:9200"]
Quick Reference Card
Essential Commands
# Start single node (OpenSearch 3.0.0)
docker run -d --name os -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=Pass#123" \
  opensearchproject/opensearch:latest

# Check health
curl -k -u admin:Pass#123 https://localhost:9200/_cluster/health?pretty

# Stop and remove
docker rm -f os

# Docker Compose
docker compose up -d
docker compose logs -f opensearch
docker compose down -v
Environment Variables Cheat Sheet
discovery.type=single-node Solo node mode
OPENSEARCH_INITIAL_ADMIN_PASSWORD Admin password (required)
DISABLE_SECURITY_PLUGIN=true No security (dev only)
OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g Heap size
bootstrap.memory_lock=true Prevent swapping
What’s Next: Day 3
Tomorrow we dive into indexing your first document — the actual reason OpenSearch exists.
We’ll cover:
Creating your first index
Understanding mappings
The document lifecycle (create, read, update, delete)
Bulk operations for real-world data
You now have a running, properly configured OpenSearch 3.0.0 environment. Time to put data in it.
Day 2 of 60 | OpenSearch Zero to Hero
Resources:
OpenSearch Docker Documentation
GitHub: Day 2 Code
Interactive Guide: opensearch.9cld.com
9Cloud is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



Stop Optimizing Your Code. Fix the Route.
Bhavesh Panchal — Thu, 08 Jan 2026 12:19:56 GMT
A SaaS team asked me to solve a latency problem.
Backend in Tokyo. Users in Jakarta and Melbourne are churning. They spent weeks on CDN tuning, database optimization, and frontend rewrites.
Nothing worked.
I looked at their traceroutes. A request from Sydney was not going to Tokyo. It was bouncing through Hong Kong, hitting a congested peering point in LA, then looping back across the Pacific.
Their code was fine. The internet was betraying them.
The Fix Nobody Tries First
We enabled Alibaba Cloud Global Accelerator.
Instead of riding the public internet, user traffic now enters Alibaba’s private backbone at the nearest edge—then travels direct to origin.
24 hours later:
Seoul: 280ms → 110ms 
Sydney: 350ms → 130ms 
Jakarta: 210ms → 95ms
Timeouts dropped 90%. Zero code changes.
Why This Works
Public internet routing optimizes for cost, not speed. Your packets take the cheapest path, not the fastest.
Global Accelerator bypasses that entirely. Users connect to a nearby edge location. From there, traffic moves over private fiber—predictable, uncongested, direct.
Same destination. Different highway.
Setup Takes 15 Minutes
Create accelerator in Alibaba Cloud Console
Get your assigned global IP
Point your domain to that IP
Attach your backend endpoint
Cost: ~$0.08/GB transferred. No hourly fees.
The Trap I Fell Into
First deployment, latency dropped beautifully. Then a traffic spike hit, and errors spiked with it.
The backend had become CPU-bound.
Global Accelerator speeds up the network. Not your server.
Faster delivery means faster queuing if your origin can’t keep up. Pair this with auto-scaling, or you will just expose bottlenecks sooner.
When It Makes Sense
Use it for: APIs, dashboards, real-time apps, anything interactive across distributed users
Skip it for: Static assets (use CDN), single-region users, batch workloads
One Thing to Try
Measure your current latency from multiple regions using Ping.pe.
If you see 200ms+ gaps between locations, the problem probably isn’t your code.
It’s the route.


OpenSearch for Beginners: The Visual Guide That Official Docs Don’t Give You
Ankit Mehta — Wed, 07 Jan 2026 07:34:23 GMT
I remember the first time I tried to learn OpenSearch.
I went to the official documentation, and within 30 seconds, I was drowning in terms I didn’t understand terms like shards, replicas, inverted indexes, analyzers, nodes, and clusters.
The docs told me how to install it. They told me how to write queries. But they never told me why I should care in the first place.
If you are in that same boat, curious about OpenSearch but confused by the documentation, this guide is for you.
By the end, you will understand:
The specific problem OpenSearch was built to solve
How it’s fundamentally different from a traditional database
When to use it (and when to absolutely avoid it)
The one concept that makes everything else click
Let’s start with the problem.
The Problem: Why Traditional Databases Fail at Search
Imagine you’re building an e-commerce site. You have a million products. A user searches for “iphone.”
With a traditional database like PostgreSQL, you’d write:
SELECT * FROM products 
WHERE name LIKE '%iphone%'
This works. But here is the problem, the database has to scan every single row to find matches.
1,000 products? Fast.
100,000 products? Noticeable lag.
10,000,000 products? Your users are waiting 5+ seconds.
This is called a full table scan. It’s O(n) complexity—meaning the more data you have, the slower it gets. Linearly.
But it gets worse.
The Other Problems with Database Search
Beyond raw speed, traditional databases have other search limitations:
No relevance ranking. If a user searches “blue running shoes,” the database returns every product that matches—but in no particular order. The perfect match shows up on page 47.
No typo tolerance. Search for “iphon” (missing the ‘e’)? Zero results. Your user thinks you don’t sell iPhones.
No fuzzy matching. “iPhone,” “iphone,” “I-Phone,” and “i phone” are all treated as completely different searches.
Expensive analytics. Want to count how many products are in each category? Or show price distributions? These aggregations are computationally expensive on traditional databases.
Search engines like OpenSearch were explicitly built to solve these problems.
The Solution: How OpenSearch Works Differently
OpenSearch uses a data structure called an inverted index.
Instead of storing documents and then scanning through them to find words, OpenSearch flips the relationship: it stores words and points them to documents.
Here is a simple example:
Traditional approach (Document → Words):
Document 1: "The quick brown fox"
Document 2: "The lazy dog"
Document 3: "Quick brown fox jumps"
To find “fox,” you scan all three documents.
Inverted index approach (Words → Documents):
"quick" → [Document 1, Document 3]
"brown" → [Document 1, Document 3]
"fox"   → [Document 1, Document 3]
"lazy"  → [Document 2]
"dog"   → [Document 2]
Now, to find “fox,” you do one dictionary lookup: O(1).
It doesn’t matter if you have 10 documents or 10 billion. The lookup time is essentially the same.
The Trade-Off
Here is the key insight that took me too long to understand:
OpenSearch trades write speed for read speed.
When you add a document to OpenSearch, it does extra work—analyzing the text, breaking it into tokens, and updating the inverted index. This is slower than a simple database insert.
But when you search? Instant. Because all that work was front-loaded.
This trade-off defines when OpenSearch is the right tool and when it’s the wrong one.
What OpenSearch Actually Is
Let me give you the complete picture.
OpenSearch is a distributed search and analytics engine built on Apache Lucene.
Let’s break that down:
Distributed: It runs on multiple servers (called nodes) that work together as a cluster. This lets it scale horizontally.
Search engine: Its primary job is finding relevant documents fast. It has built-in support for relevance scoring, fuzzy matching, and typo tolerance.
Analytics engine: Beyond search, it excels at real-time aggregations—counting, summing, grouping data across millions of records.
Apache Lucene: The low-level indexing library that powers the inverted index. OpenSearch (and Elasticsearch before it) are essentially Lucene with a distributed layer and REST API on top.
The Origin Story
OpenSearch is a fork of Elasticsearch.
In 2021, Elastic (the company behind Elasticsearch) changed its licensing from Apache 2.0 to a more restrictive model. AWS, along with the community, responded by forking the last open-source version (7.10.2) and continuing development under the Apache 2.0 license.
That fork became OpenSearch.
Practical implication: OpenSearch is API-compatible with Elasticsearch 7.10. If you have existing Elasticsearch code, it probably works with OpenSearch with minimal changes.
The Two Components
OpenSearch actually has two main parts:
1. OpenSearch (the engine)
Stores and indexes your data
Processes search queries
Runs aggregations and analytics
Exposes a REST API (default port: 9200)
2. OpenSearch Dashboards
Web interface for visualization
Build dashboards and charts
Dev Tools console for running queries
Alerting and monitoring
Think of it like PostgreSQL (the database) and pgAdmin (the GUI). You can use the engine without the dashboards, but the dashboards make it much easier to explore your data.
When to Use OpenSearch
OpenSearch shines in four main scenarios:
1. Full-Text Search
The problem: Users need to find content fast—products, articles, documentation, people.
Why OpenSearch: Instant results, relevance ranking, typo tolerance, autocomplete, faceted search.
Examples: E-commerce product search, internal documentation search, Wikipedia-style search boxes.
2. Log Analytics
The problem: You have millions of log lines from servers, applications, and services. Finding a specific error is like finding a needle in a haystack.
Why OpenSearch: Ingest logs in real-time, search across all sources instantly, visualize patterns and anomalies.
Examples: Debugging errors across 100 servers, tracking API latencies, security log analysis.
3. Observability & Metrics
The problem: You need visibility into system health—CPU usage, request counts, error rates—across your entire infrastructure.
Why OpenSearch: Time-series aggregations, real-time dashboards, alerting on thresholds.
Examples: Infrastructure monitoring, APM dashboards, SLA tracking.
4. AI & Vector Search
The problem: Traditional keyword search doesn’t understand meaning. “automobile” and “car” are treated as different concepts.
Why OpenSearch: Native support for vector embeddings, k-NN (nearest neighbor) search, hybrid search combining keywords and vectors.
Examples: Semantic search, RAG (Retrieval-Augmented Generation) pipelines, image similarity, recommendation systems.
When NOT to Use OpenSearch
This is the part most tutorials skip. But understanding when not to use a tool is just as important as knowing when to use it.
❌ Don’t Use It As Your Primary Database
Why not: OpenSearch is not ACID-compliant. It prioritizes speed over data consistency guarantees.
The problem: If a node fails mid-write, you might lose data or end up with inconsistent state. There’s no rollback mechanism like a traditional database transaction.
What to do instead: Use PostgreSQL, MySQL, or another relational database as your source of truth. Sync data to OpenSearch for search capabilities. This is called the “dual-write” or “CQRS” pattern.
❌ Don’t Use It for Frequently Updated Records
Why not: Documents in OpenSearch are immutable. When you “update” a document, OpenSearch actually deletes the old version and reindexes a completely new one.
The problem: If you’re updating a user’s last-login timestamp on every request, or incrementing a view counter frequently, you’re triggering delete+reindex operations constantly. This is very expensive.
What to do instead: Keep frequently-changing data in a database optimized for updates. Only sync relatively stable data to OpenSearch.
❌ Don’t Use It for Multi-Document Transactions
Why not: OpenSearch has no concept of transactions spanning multiple documents. Each document write is independent.
The problem: “Transfer $100 from Account A to Account B” requires two operations: debit A, credit B. In a database, this is one atomic transaction. In OpenSearch, A might be debited but B might fail to credit. Now your data is inconsistent.
What to do instead: Handle any operation requiring transactional guarantees in your database first, then update OpenSearch.
The Right Mental Model
Think of it this way:
Database = Source of truth, handles transactions, your data’s home
OpenSearch = Specialized search layer that syncs from your database
They work together. OpenSearch doesn’t replace your database; it complements it.
The One Concept That Makes Everything Click
If you remember nothing else from this article, remember this:
OpenSearch builds an index at write time so it can search at O(1) at read time.
That’s the fundamental insight.
Every other concept—shards, replicas, analyzers, mappings—they all exist to make that indexing process more efficient, more distributed, or more customizable.
Once you understand that core trade-off, everything else starts to make sense:
Why does OpenSearch need so much RAM? To keep indexes in memory for fast lookup.
What’s a shard? A way to split a large index across multiple machines.
What’s an analyzer? A way to customize how text is broken into tokens for the index.
Why is updating slow? Because it’s rebuilding the index, not just changing a value.
Try It Yourself
Want to see this in action? Here’s the fastest way to spin up OpenSearch locally:
docker run -p 9200:9200 -p 9600:9600 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=MyPassword123!" \
  opensearchproject/opensearch:latest
Wait about 30 seconds, then visit https://localhost:9200 (accept the self-signed certificate warning). You should see JSON confirming OpenSearch is running.
Now you have a search engine to experiment with.
What’s Next
This was the “why” and “what” of OpenSearch. In the next article, I’ll go deeper into the “how”:
How text analysis actually works (tokenization, stemming, filters)
How relevance scoring determines which results come first
How to write your first search queries
If you found this helpful, subscribe so you don’t miss it.
And if you have questions, drop them in the comments—I read everything.
Resources:
Interactive Visual Guide for Day 1: opensearch.9cld.com/day/01-fundamentals
Complete Guide (all days): opensearch.9cld.com
This is part of my series breaking down OpenSearch from first principles. No assumed knowledge. No skipped steps. Just clear explanations of how search actually works.
9Cloud is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.