Day 3: Your First Conversation with OpenSearch

Build your first searchable database with real data and real queries

Jan 14, 2026

Yesterday, we got OpenSearch running in Docker. Today, we learn to actually talk to it.

Why This Matters

Here’s what most tutorials get wrong, they throw API commands at you without explaining the conversation happening under the hood.

OpenSearch isn’t magic. It is a service that speaks REST. Every operation, indexing a document, running a search, checking cluster health, is just an HTTP request.

Today you will understand:

How to communicate with OpenSearch (the mechanics)
Why each method exists (the reasoning)
What happens internally (the architecture)

By the end, you won’t just know what commands to run. You will understand why they work.

The REST API: Your Language to OpenSearch

OpenSearch speaks HTTP. That’s it. No proprietary protocol, no special client required.

Every operation follows this pattern:

HTTP_METHOD host:port/index/operation

The Five HTTP Methods

GET - Retrieve information

GET /_cluster/health          # Cluster status
GET /products/_search         # Search documents
GET /products/_doc/123        # Get specific document

PUT - Create or replace (idempotent)

PUT /products                 # Create index
PUT /products/_doc/123        # Create/replace document with ID

POST - Create or update (not idempotent)

POST /products/_doc           # Create document (auto-generate ID)
POST /products/_search        # Search (yes, POST for complex queries)

DELETE - Remove resources

DELETE /products/_doc/123     # Delete document
DELETE /products              # Delete entire index

HEAD - Check existence (no body returned)

HEAD /products                # Does index exist?

The Anatomy of a Request

Let’s dissect a real request:

bash

curl -X POST "https://localhost:9200/products/_doc" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword \
  -k -d '{
    "name": "Wireless Headphones",
    "price": 149.99
  }'

Breaking it down:

-X POST - HTTP method (create)
localhost:9200 - Your cluster endpoint
/products - Target index name
/_doc - Document API endpoint
-H "Content-Type: application/json" - We’re sending JSON
-u admin:YourPassword - Basic auth credentials
-k - Skip SSL verification (dev only!)
-d '{...}' - The JSON document body

Step 1: Check Cluster Health

Before doing anything, verify your cluster is healthy:

curl -X GET "https://localhost:9200/_cluster/health?pretty" \
  -u admin:YourPassword -k

Response:

{
  "cluster_name": "opensearch-cluster",
  "status": "green",
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 5,
  "active_shards": 5
}

Understanding Status Colors

green - All shards assigned. You’re good!

yellow - Primary shards OK, some replicas unassigned. Normal for a single node.

red - Some primary shards are missing. Data unavailable, investigate!

Why yellow is OK for development: With one node, OpenSearch can’t place replica shards on a different node. It is smart enough to not put the backup on the same machine as the original.

Understanding “Tables” in OpenSearch

If you are coming from a relational database background, you might be wondering: “Where are the tables?”

OpenSearch doesn’t have tables. Instead, it has indexes.

Database vs OpenSearch terminology:

Database → Cluster
Table → Index
Row → Document
Column → Field
Schema → Mapping

So when someone says “create a table,” in OpenSearch we say “create an index.”

The key difference?

In a database, you query rows.

In OpenSearch, you search documents.

The entire system is optimized for finding things fast, not for transactions or joins.

Step 2: Create an Index

An index is where your documents live. Think of it as a database table, but optimized for search.

Basic Index Creation

The simplest way to create an index:

curl -X PUT "https://localhost:9200/products" \
  -u admin:YourPassword -k

That’s it. OpenSearch will create an index with default settings. But you will almost always want to define settings and mappings upfront.

Index with Settings and Mappings

curl -X PUT "https://localhost:9200/products" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword -k -d '{
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    },
    "mappings": {
      "properties": {
        "name": { "type": "text" },
        "price": { "type": "float" },
        "category": { "type": "keyword" },
        "description": { "type": "text" },
        "in_stock": { "type": "boolean" },
        "created_at": { "type": "date" }
      }
    }
  }'

Verify Your Index Was Created

curl -X GET "https://localhost:9200/products?pretty" \
  -u admin:YourPassword -k

List All Indexes

curl -X GET "https://localhost:9200/_cat/indices?v" \
  -u admin:YourPassword -k

Response:

health status index    uuid                   pri rep docs.count store.size
green  open   products abc123xyz...           1   1   0          230b

Delete an Index

curl -X DELETE "https://localhost:9200/products" \
  -u admin:YourPassword -k

Warning: This deletes all data in the index. There’s no “are you sure?” prompt.

Understanding the Settings

number_of_shards: 1

How many pieces to split the index into
More shards = better parallelism for large datasets
Can’t change after creation!
Rule of thumb: 1 shard can handle ~30GB

number_of_replicas: 1

Copies of each shard for redundancy
More replicas = better read throughput + fault tolerance
Can change anytime

Understanding the Mapping

The mapping defines how each field is stored and indexed:

text - For full-text content. Analyzed word-by-word for search.

keyword - For exact values like categories or IDs. Exact match only.

float - For decimal numbers. Supports range queries.

boolean - For true/false values. Used in filter queries.

date - For timestamps. Supports date math queries.

Critical distinction:

text fields are analyzed - “Wireless Headphones” becomes tokens [”wireless”, “headphones”]
keyword fields are exact - “electronics” stays “electronics”

Step 3: Index Documents

Single Document

bash

curl -X POST "https://localhost:9200/products/_doc" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword -k -d '{
    "name": "Wireless Headphones",
    "price": 149.99,
    "category": "electronics",
    "description": "Premium noise-canceling wireless headphones with 30-hour battery",
    "in_stock": true,
    "created_at": "2025-01-14"
  }'

Response:

{
  "_index": "products",
  "_id": "abc123xyz",    // Auto-generated
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1
  }
}

Bulk Ingestion (10-100x Faster)

For multiple documents, use the _bulk API:

curl -X POST "https://localhost:9200/_bulk" \
  -H "Content-Type: application/x-ndjson" \
  -u admin:YourPassword -k -d '
{"index": {"_index": "products"}}
{"name": "Laptop Stand", "price": 49.99, "category": "accessories"}
{"index": {"_index": "products"}}
{"name": "USB-C Hub", "price": 79.99, "category": "accessories"}
{"index": {"_index": "products"}}
{"name": "Mechanical Keyboard", "price": 159.99, "category": "electronics"}
'

Why bulk is faster:

Single network round-trip instead of N
Batches writes to Lucene
Reduces per-request overhead

Best practices:

Batch 1,000-5,000 documents per request
Don’t exceed ~100MB per request
Use newline-delimited JSON (note the trailing newline!)

Step 4: Search Your Data

Now the fun part - actually searching.

Basic Match Query

curl -X GET "https://localhost:9200/products/_search" \
  -H "Content-Type: application/json" \
  -u admin:YourPassword -k -d '{
    "query": {
      "match": {
        "description": "wireless headphones"
      }
    }
  }'

This searches the description field for documents containing “wireless” OR “headphones” (or both).

Understanding the Response

{
  "took": 5,                          // Query time in milliseconds
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"                // "eq" = exact, "gte" = at least
    },
    "max_score": 2.8567,              // Highest relevance score
    "hits": [
      {
        "_index": "products",
        "_id": "abc123",
        "_score": 2.8567,             // This document's relevance
        "_source": {                   // The actual document
          "name": "Wireless Headphones",
          "price": 149.99,
          "description": "Premium noise-canceling wireless headphones"
        }
      }
    ]
  }
}

Query Types Cheat Sheet

Match Query - Full-text search with analysis

{"query": {"match": {"description": "wireless bluetooth"}}}

Use for: Product descriptions, article content, user-generated text

Term Query - Exact value matching (no analysis)

{"query": {"term": {"category": "electronics"}}}

Use for: Categories, status fields, IDs

Range Query - Numeric/date ranges

{"query": {"range": {"price": {"gte": 50, "lte": 200}}}}

Use for: Price filters, date ranges

Bool Query - Combine multiple conditions

{
  "query": {
    "bool": {
      "must": [
        {"match": {"description": "wireless"}}
      ],
      "filter": [
        {"term": {"category": "electronics"}},
        {"range": {"price": {"lte": 200}}}
      ]
    }
  }
}

Important: Use filter for conditions that don’t affect relevance score - they’re cached and much faster!

Key Concepts Explained

What is a Cluster?

A cluster is a collection of nodes working together. It:

Distributes data across nodes automatically
Provides fault tolerance (if one node fails, others continue)
Scales horizontally by adding more nodes

What is a Node?

A node is a single OpenSearch server instance. Types:

Cluster Manager: Coordinates the cluster
Data Node: Stores data, executes searches
Ingest Node: Pre-processes documents
Coordinating Node: Routes requests (all nodes can do this)

What is an Index?

An index is a collection of documents with similar characteristics. It:

Has a mapping defining field types
Is split into shards for distribution
Can have aliases for easier management

What is a Document?

A document is a JSON object - the basic unit of information:

{
  "name": "Wireless Headphones",
  "price": 149.99,
  "_id": "abc123"      // Metadata
}

What is a Shard?

A shard is a subset of an index:

Primary shard: Original data
Replica shard: Copy for redundancy

Shards enable:

Parallel processing across nodes
High availability (replicas on different nodes)
Horizontal scaling

What is a Mapping?

A mapping defines how fields are stored and indexed:

{
  "properties": {
    "name": {"type": "text"},        // Analyzed for full-text search
    "category": {"type": "keyword"}  // Not analyzed, exact match
  }
}

Public Datasets to Practice With

Want to practice with real data? Here are excellent options:

1. OpenSearch Sample Data (Easiest)

Built into OpenSearch Dashboards. Go to Home → Add sample data.

eCommerce orders
Flight data
Web logs

2. Kaggle E-commerce Dataset

500K+ retail transactions. Perfect for:

Product search
Customer analytics
Aggregations practice

3. Wikipedia Dumps

Full-text articles in various sizes. Perfect for:

Large-scale indexing
Full-text search testing
NLP experiments

4. NYC Taxi Data

Trip records with geo coordinates. Perfect for:

Geo-spatial queries
Time-series analysis
Dashboard building

Cloud Considerations

AWS OpenSearch Service

When you create a domain in AWS:

Endpoint:

https://your-domain.region.es.amazonaws.com

Auth: IAM roles or fine-grained access control
VPC: Can be in VPC or public (with IP restrictions)

# With IAM auth (need AWS CLI configured)
aws opensearch-serverless query \
  --endpoint https://your-domain.region.es.amazonaws.com \
  --query '{"query": {"match_all": {}}}'

Alibaba Cloud OpenSearch

Similar managed service with:

Auto-scaling capabilities
Built-in machine learning features
Different pricing model (pay per query unit)

Today’s Mini-Project: Product Catalog Search

Create a products index with proper mapping
Bulk index at least 10 products
Implement these searches:
- Full-text search on product names/descriptions
- Filter by category (keyword)
- Price range filter
- Combined search with bool query

Interactive Visual Guide: opensearch.9cld.com/day/03-first-conversation

What’s Next (Day 4)

Tomorrow we’ll dive into mappings and analyzers - the secret sauce behind great search. You’ll learn:

Why does the same search return different results with different mappings
How analyzers transform text into searchable tokens
Building custom analyzers for your use case

9Cloud

Discussion about this post

Ready for more?