Day 3: Your First Conversation with OpenSearch
Build your first searchable database with real data and real queries
Yesterday, we got OpenSearch running in Docker. Today, we learn to actually talk to it.
Why This Matters
Here’s what most tutorials get wrong, they throw API commands at you without explaining the conversation happening under the hood.
OpenSearch isn’t magic. It is a service that speaks REST. Every operation, indexing a document, running a search, checking cluster health, is just an HTTP request.
Today you will understand:
How to communicate with OpenSearch (the mechanics)
Why each method exists (the reasoning)
What happens internally (the architecture)
By the end, you won’t just know what commands to run. You will understand why they work.
The REST API: Your Language to OpenSearch
OpenSearch speaks HTTP. That’s it. No proprietary protocol, no special client required.
Every operation follows this pattern:
HTTP_METHOD host:port/index/operationThe Five HTTP Methods
GET - Retrieve information
GET /_cluster/health # Cluster status
GET /products/_search # Search documents
GET /products/_doc/123 # Get specific documentPUT - Create or replace (idempotent)
PUT /products # Create index
PUT /products/_doc/123 # Create/replace document with IDPOST - Create or update (not idempotent)
POST /products/_doc # Create document (auto-generate ID)
POST /products/_search # Search (yes, POST for complex queries)DELETE - Remove resources
DELETE /products/_doc/123 # Delete document
DELETE /products # Delete entire indexHEAD - Check existence (no body returned)
HEAD /products # Does index exist?The Anatomy of a Request
Let’s dissect a real request:
bash
curl -X POST "https://localhost:9200/products/_doc" \
-H "Content-Type: application/json" \
-u admin:YourPassword \
-k -d '{
"name": "Wireless Headphones",
"price": 149.99
}'Breaking it down:
-X POST- HTTP method (create)localhost:9200- Your cluster endpoint/products- Target index name/_doc- Document API endpoint-H "Content-Type: application/json"- We’re sending JSON-u admin:YourPassword- Basic auth credentials-k- Skip SSL verification (dev only!)-d '{...}'- The JSON document body
Step 1: Check Cluster Health
Before doing anything, verify your cluster is healthy:
curl -X GET "https://localhost:9200/_cluster/health?pretty" \
-u admin:YourPassword -kResponse:
{
"cluster_name": "opensearch-cluster",
"status": "green",
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 5,
"active_shards": 5
}Understanding Status Colors
green - All shards assigned. You’re good!
yellow - Primary shards OK, some replicas unassigned. Normal for a single node.
red - Some primary shards are missing. Data unavailable, investigate!
Why yellow is OK for development: With one node, OpenSearch can’t place replica shards on a different node. It is smart enough to not put the backup on the same machine as the original.
Understanding “Tables” in OpenSearch
If you are coming from a relational database background, you might be wondering: “Where are the tables?”
OpenSearch doesn’t have tables. Instead, it has indexes.
Database vs OpenSearch terminology:
Database → Cluster
Table → Index
Row → Document
Column → Field
Schema → Mapping
So when someone says “create a table,” in OpenSearch we say “create an index.”
The key difference?
In a database, you query rows.
In OpenSearch, you search documents.
The entire system is optimized for finding things fast, not for transactions or joins.
Step 2: Create an Index
An index is where your documents live. Think of it as a database table, but optimized for search.
Basic Index Creation
The simplest way to create an index:
curl -X PUT "https://localhost:9200/products" \
-u admin:YourPassword -kThat’s it. OpenSearch will create an index with default settings. But you will almost always want to define settings and mappings upfront.
Index with Settings and Mappings
curl -X PUT "https://localhost:9200/products" \
-H "Content-Type: application/json" \
-u admin:YourPassword -k -d '{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"name": { "type": "text" },
"price": { "type": "float" },
"category": { "type": "keyword" },
"description": { "type": "text" },
"in_stock": { "type": "boolean" },
"created_at": { "type": "date" }
}
}
}'Verify Your Index Was Created
curl -X GET "https://localhost:9200/products?pretty" \
-u admin:YourPassword -kList All Indexes
curl -X GET "https://localhost:9200/_cat/indices?v" \
-u admin:YourPassword -kResponse:
health status index uuid pri rep docs.count store.size
green open products abc123xyz... 1 1 0 230bDelete an Index
curl -X DELETE "https://localhost:9200/products" \
-u admin:YourPassword -kWarning: This deletes all data in the index. There’s no “are you sure?” prompt.
Understanding the Settings
number_of_shards: 1
How many pieces to split the index into
More shards = better parallelism for large datasets
Can’t change after creation!
Rule of thumb: 1 shard can handle ~30GB
number_of_replicas: 1
Copies of each shard for redundancy
More replicas = better read throughput + fault tolerance
Can change anytime
Understanding the Mapping
The mapping defines how each field is stored and indexed:
text - For full-text content. Analyzed word-by-word for search.
keyword - For exact values like categories or IDs. Exact match only.
float - For decimal numbers. Supports range queries.
boolean - For true/false values. Used in filter queries.
date - For timestamps. Supports date math queries.
Critical distinction:
textfields are analyzed - “Wireless Headphones” becomes tokens [”wireless”, “headphones”]keywordfields are exact - “electronics” stays “electronics”
Step 3: Index Documents
Single Document
bash
curl -X POST "https://localhost:9200/products/_doc" \
-H "Content-Type: application/json" \
-u admin:YourPassword -k -d '{
"name": "Wireless Headphones",
"price": 149.99,
"category": "electronics",
"description": "Premium noise-canceling wireless headphones with 30-hour battery",
"in_stock": true,
"created_at": "2025-01-14"
}'Response:
{
"_index": "products",
"_id": "abc123xyz", // Auto-generated
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1
}
}Bulk Ingestion (10-100x Faster)
For multiple documents, use the _bulk API:
curl -X POST "https://localhost:9200/_bulk" \
-H "Content-Type: application/x-ndjson" \
-u admin:YourPassword -k -d '
{"index": {"_index": "products"}}
{"name": "Laptop Stand", "price": 49.99, "category": "accessories"}
{"index": {"_index": "products"}}
{"name": "USB-C Hub", "price": 79.99, "category": "accessories"}
{"index": {"_index": "products"}}
{"name": "Mechanical Keyboard", "price": 159.99, "category": "electronics"}
'Why bulk is faster:
Single network round-trip instead of N
Batches writes to Lucene
Reduces per-request overhead
Best practices:
Batch 1,000-5,000 documents per request
Don’t exceed ~100MB per request
Use newline-delimited JSON (note the trailing newline!)
Step 4: Search Your Data
Now the fun part - actually searching.
Basic Match Query
curl -X GET "https://localhost:9200/products/_search" \
-H "Content-Type: application/json" \
-u admin:YourPassword -k -d '{
"query": {
"match": {
"description": "wireless headphones"
}
}
}'This searches the description field for documents containing “wireless” OR “headphones” (or both).
Understanding the Response
{
"took": 5, // Query time in milliseconds
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1
},
"hits": {
"total": {
"value": 1,
"relation": "eq" // "eq" = exact, "gte" = at least
},
"max_score": 2.8567, // Highest relevance score
"hits": [
{
"_index": "products",
"_id": "abc123",
"_score": 2.8567, // This document's relevance
"_source": { // The actual document
"name": "Wireless Headphones",
"price": 149.99,
"description": "Premium noise-canceling wireless headphones"
}
}
]
}
}Query Types Cheat Sheet
Match Query - Full-text search with analysis
{"query": {"match": {"description": "wireless bluetooth"}}}Use for: Product descriptions, article content, user-generated text
Term Query - Exact value matching (no analysis)
{"query": {"term": {"category": "electronics"}}}Use for: Categories, status fields, IDs
Range Query - Numeric/date ranges
{"query": {"range": {"price": {"gte": 50, "lte": 200}}}}Use for: Price filters, date ranges
Bool Query - Combine multiple conditions
{
"query": {
"bool": {
"must": [
{"match": {"description": "wireless"}}
],
"filter": [
{"term": {"category": "electronics"}},
{"range": {"price": {"lte": 200}}}
]
}
}
}Important: Use filter for conditions that don’t affect relevance score - they’re cached and much faster!
Key Concepts Explained
What is a Cluster?
A cluster is a collection of nodes working together. It:
Distributes data across nodes automatically
Provides fault tolerance (if one node fails, others continue)
Scales horizontally by adding more nodes
What is a Node?
A node is a single OpenSearch server instance. Types:
Cluster Manager: Coordinates the cluster
Data Node: Stores data, executes searches
Ingest Node: Pre-processes documents
Coordinating Node: Routes requests (all nodes can do this)
What is an Index?
An index is a collection of documents with similar characteristics. It:
Has a mapping defining field types
Is split into shards for distribution
Can have aliases for easier management
What is a Document?
A document is a JSON object - the basic unit of information:
{
"name": "Wireless Headphones",
"price": 149.99,
"_id": "abc123" // Metadata
}What is a Shard?
A shard is a subset of an index:
Primary shard: Original data
Replica shard: Copy for redundancy
Shards enable:
Parallel processing across nodes
High availability (replicas on different nodes)
Horizontal scaling
What is a Mapping?
A mapping defines how fields are stored and indexed:
{
"properties": {
"name": {"type": "text"}, // Analyzed for full-text search
"category": {"type": "keyword"} // Not analyzed, exact match
}
}Public Datasets to Practice With
Want to practice with real data? Here are excellent options:
1. OpenSearch Sample Data (Easiest)
Built into OpenSearch Dashboards. Go to Home → Add sample data.
eCommerce orders
Flight data
Web logs
2. Kaggle E-commerce Dataset
500K+ retail transactions. Perfect for:
Product search
Customer analytics
Aggregations practice
3. Wikipedia Dumps
Full-text articles in various sizes. Perfect for:
Large-scale indexing
Full-text search testing
NLP experiments
4. NYC Taxi Data
Trip records with geo coordinates. Perfect for:
Geo-spatial queries
Time-series analysis
Dashboard building
Cloud Considerations
AWS OpenSearch Service
When you create a domain in AWS:
Endpoint:
https://your-domain.region.es.amazonaws.com
Auth: IAM roles or fine-grained access control
VPC: Can be in VPC or public (with IP restrictions)
# With IAM auth (need AWS CLI configured)
aws opensearch-serverless query \
--endpoint https://your-domain.region.es.amazonaws.com \
--query '{"query": {"match_all": {}}}'Alibaba Cloud OpenSearch
Similar managed service with:
Auto-scaling capabilities
Built-in machine learning features
Different pricing model (pay per query unit)
Today’s Mini-Project: Product Catalog Search
Create a
productsindex with proper mappingBulk index at least 10 products
Implement these searches:
Full-text search on product names/descriptions
Filter by category (keyword)
Price range filter
Combined search with bool query
Interactive Visual Guide: opensearch.9cld.com/day/03-first-conversation
What’s Next (Day 4)
Tomorrow we’ll dive into mappings and analyzers - the secret sauce behind great search. You’ll learn:
Why does the same search return different results with different mappings
How analyzers transform text into searchable tokens
Building custom analyzers for your use case


