mapping.json


Overview

The mapping.json file defines the configuration and schema mappings for an Elasticsearch index. It specifies how documents are indexed, stored, and searched within the Elasticsearch cluster. This configuration includes index settings like shard count and refresh interval, a custom similarity scoring script for text fields, and detailed dynamic mapping templates that automatically map fields based on their names and patterns to appropriate Elasticsearch data types.

This file is crucial for shaping the behavior of the Elasticsearch index to optimize search relevance, storage efficiency, and query performance. It is typically used during index creation or update via the Elasticsearch REST API.


Detailed Explanation

Root Structure


Settings Section

1. index

2. similarity

Defines custom similarity scoring methods used during full-text search.


Mappings Section

1. properties

Defines explicit field mappings:

2. date_detection

3. dynamic_templates

Dynamic templates allow automatic field mapping based on field name patterns and regex matching. Each template has:

Below are key templates:

Template Name

Match Pattern

Mapping Type

Notable Parameters

Description

int

*_int

integer

store: true

Maps fields ending with _int to integer type

ulong

*_ulong

unsigned_long

store: true

Unsigned long integers

long

*_long

long

store: true

Signed long integers

short

*_short

short

store: true

Short integers

numeric

*_flt

float

store: true

Floating-point numbers

tks

*_tks

text

analyzer: whitespace, similarity: scripted_sim, store: true

Tokenized text fields with custom similarity

ltks

*_ltks

text

analyzer: whitespace, store: true

Tokenized text fields without custom similarity

kwd

regex [^(.*_(kwd

id

ids

uid

dt

regex [^.*(_dt

_time

_at)$](/projects/311/73485)

date

nested

*_nst

nested

Nested objects for complex hierarchies

object

*_obj

object

dynamic: true

JSON-like objects with dynamic fields

string

regex [^.*_(with_weight

list)$](/projects/311/73485)

text

index: false, store: true

rank_feature

*_fea

rank_feature

Fields to be used for ranking features

rank_features

*_feas

rank_features

Sets of ranking features

dense_vector

*_512_vec

dense_vector

dims: 512, similarity: cosine, index: true

512-dimensional dense vectors for similarity search

dense_vector

*_768_vec

dense_vector

dims: 768, similarity: cosine, index: true

768-dimensional dense vectors

dense_vector

*_1024_vec

dense_vector

dims: 1024, similarity: cosine, index: true

1024-dimensional dense vectors

dense_vector

*_1536_vec

dense_vector

dims: 1536, similarity: cosine, index: true

1536-dimensional dense vectors

binary

*_bin

binary

Binary data fields


Important Implementation Details


Interaction With Other System Components


Usage Examples

Example 1: Creating an index with this mapping

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d @mapping.json

Example 2: Indexing a document

{
  "user_id": "12345_uid",
  "created_at": "2024-06-01 15:30:00",
  "location_lat_lon": "40.7128,-74.0060",
  "description_tks": "quick brown fox jumps",
  "embedding_512_vec": [0.12, 0.34, ..., 0.56]  // 512 floats
}

The fields will be dynamically mapped according to the templates, e.g., user_id as keyword, created_at as date, etc.


Visual Diagram

flowchart TD
    A[Settings] --> B[Index Settings]
    A --> C[Similarity]
    B --> B1[number_of_shards: 2]
    B --> B2[number_of_replicas: 0]
    B --> B3[refresh_interval: 1000ms]
    C --> C1[scripted_sim]
    C1 --> C1a[Custom IDF script]

    D[Mappings] --> E[Properties]
    E --> E1[lat_lon: geo_point]

    D --> F[Dynamic Templates]

    F --> F1[int: *_int -> integer]
    F --> F2[ulong: *_ulong -> unsigned_long]
    F --> F3[long: *_long -> long]
    F --> F4[short: *_short -> short]
    F --> F5[numeric: *_flt -> float]
    F --> F6[tks: *_tks -> text + scripted_sim]
    F --> F7[ltks: *_ltks -> text]
    F --> F8[kwd: regex -> keyword + boolean similarity]
    F --> F9[dt: regex -> date with multiple formats]
    F --> F10[nested: *_nst -> nested]
    F --> F11[object: *_obj -> object]
    F --> F12[string: regex -> text (not indexed)]
    F --> F13[rank_feature: *_fea -> rank_feature]
    F --> F14[rank_features: *_feas -> rank_features]
    F --> F15[dense_vector: *_512_vec -> 512 dims, cosine]
    F --> F16[dense_vector: *_768_vec -> 768 dims, cosine]
    F --> F17[dense_vector: *_1024_vec -> 1024 dims, cosine]
    F --> F18[dense_vector: *_1536_vec -> 1536 dims, cosine]
    F --> F19[binary: *_bin -> binary]

Summary

mapping.json is a comprehensive Elasticsearch index configuration file that governs index behavior, scoring, and field mapping. It leverages dynamic templates for flexible field typing, includes a custom similarity script to tailor search relevance, and supports advanced data types like geo points and dense vectors for modern search scenarios.

Correct use and maintenance of this file ensure efficient indexing, accurate search results, and seamless integration with other system components like query engines and ingestion pipelines.