infinity_mapping.json


Overview

The infinity_mapping.json file defines a structured schema mapping for a data storage or indexing system, most likely intended for use with a search engine or a document database that supports advanced text analysis features (e.g., Elasticsearch or Apache Solr). This mapping specifies the fields, their data types, default values, and text analyzers applied to various textual fields. It appears to be tailored for managing rich document metadata, keywords, content tokens, and entity relationships within a knowledge or document management system.

The key purpose of this file is to provide a consistent blueprint that governs how document records or knowledge base entries are indexed, queried, and stored, ensuring efficient retrieval and relevance scoring with support for tokenization, ranking features, and entity graph relationships.


Detailed Explanation of Fields

This file does not contain classes or functions; instead, it is a JSON schema mapping. Each key represents a field name, and the associated object provides metadata about the field's data type, default value, and optional analyzers or features.

Field Breakdown

Field Name

Type

Default

Analyzer / Feature

Description / Usage

id

varchar

""

Unique identifier for the record/document.

doc_id

varchar

""

Document identifier, possibly linking to a source document.

kb_id

varchar

""

Knowledge base ID, indicating the knowledge base source or category.

create_time

varchar

""

Creation timestamp as a string.

create_timestamp_flt

float

0.0

Creation timestamp as a floating point number (e.g., Unix timestamp).

img_id

varchar

""

Identifier for an associated image, if any.

docnm_kwd

varchar

""

Document name keywords, likely for keyword-based search.

title_tks

varchar

""

whitespace

Tokenized title field analyzed by whitespace tokenizer (splits on spaces).

title_sm_tks

varchar

""

whitespace

Smaller or simplified tokenized title field, also whitespace analyzed.

name_kwd

varchar

""

whitespace-#

Name keywords analyzed with a custom analyzer (likely splitting on whitespace and # symbol).

important_kwd

varchar

""

whitespace-#

Important keywords extracted from the document, analyzed similarly as above.

tag_kwd

varchar

""

whitespace-#

Tags associated with the document for categorization, analyzed with custom analyzer.

important_tks

varchar

""

whitespace

Tokenized important words, whitespace tokenized.

question_kwd

varchar

""

whitespace-#

Keywords extracted from questions or queries related to the document.

question_tks

varchar

""

whitespace

Tokenized question terms, whitespace analyzer.

content_with_weight

varchar

""

Content of the document coupled with weight annotations (likely weighted terms).

content_ltks

varchar

""

whitespace

Tokenized content with light tokenization (whitespace).

content_sm_ltks

varchar

""

whitespace

Smaller or simplified tokenized content.

authors_tks

varchar

""

whitespace

Tokenized authors of the document.

authors_sm_tks

varchar

""

whitespace

Smaller or simplified tokenized authors.

page_num_int

varchar

""

Page number of the document or content segment.

top_int

varchar

""

Possibly a ranking or position indicator within a page or list.

position_int

varchar

""

Position index within a larger document or dataset.

weight_int

integer

0

Integer weight for relevance, importance, or ranking.

weight_flt

float

0.0

Floating point weight for finer-grained relevance scoring.

rank_int

integer

0

Integer rank for sorting or priority purposes.

rank_flt

float

0

Floating rank score.

available_int

integer

1

Availability flag (e.g., 1 = available, 0 = unavailable).

knowledge_graph_kwd

varchar

""

Keywords representing the knowledge graph entities or concepts linked to the document.

entities_kwd

varchar

""

whitespace-#

Entity keywords extracted from the document, analyzed with the custom analyzer.

pagerank_fea

integer

0

Feature capturing PageRank or similar graph-based ranking metric.

tag_feas

varchar

""

rankfeatures

Rank features derived from tags, analyzed with a rank features analyzer (likely used to improve search relevance).


Entity and Graph Relationship Fields

Field Name

Type

Default

Analyzer / Feature

Description

from_entity_kwd

varchar

""

whitespace-#

Starting entity keyword in a graph edge or relationship.

to_entity_kwd

varchar

""

whitespace-#

Ending entity keyword in a graph edge or relationship.

entity_kwd

varchar

""

whitespace-#

Entity keywords in general.

entity_type_kwd

varchar

""

whitespace-#

Entity type keywords, e.g., person, organization, location, etc.

source_id

varchar

""

whitespace-#

Source identifier for the entity or relationship.

n_hop_with_weight

varchar

""

N-hop neighbors with weight information, used for graph traversal or influence propagation.

removed_kwd

varchar

""

whitespace-#

Keywords marked as removed or deprecated.

doc_type_kwd

varchar

""

whitespace-#

Document type keywords, e.g., article, report, FAQ, etc.


Important Implementation Details


Usage Examples

Since this is a schema mapping file for indexing or storage, usage examples would be in context of indexing documents or querying the system.

Example: Indexing a Document

{
  "id": "doc123",
  "doc_id": "D-4567",
  "kb_id": "kb789",
  "create_time": "2024-06-05T12:00:00Z",
  "create_timestamp_flt": 1717646400.0,
  "title_tks": "infinity mapping schema",
  "weight_int": 10,
  "rank_flt": 0.95,
  "entities_kwd": "entity1#entity2#entity3",
  "from_entity_kwd": "entity1",
  "to_entity_kwd": "entity2",
  "n_hop_with_weight": "entity3:0.5,entity4:0.3",
  "available_int": 1
}

This document would be indexed according to the mapping, supporting complex queries on tokenized titles, entity graph traversals, and relevance ranking.


Interaction with Other System Components


Visual Diagram: Flowchart of Main Field Categories and Relationships

flowchart TD
    A[infinity_mapping.json Schema] --> B[Document Metadata Fields]
    A --> C[Textual Content Fields]
    A --> D[Keyword & Token Fields]
    A --> E[Ranking & Weight Fields]
    A --> F[Entity & Graph Relationship Fields]

    B --> B1[id, doc_id, kb_id, create_time, create_timestamp_flt, img_id, page_num_int]
    C --> C1[title_tks, title_sm_tks, content_with_weight, content_ltks, content_sm_ltks]
    D --> D1[docnm_kwd, name_kwd, important_kwd, tag_kwd, important_tks, question_kwd, question_tks, authors_tks, authors_sm_tks]
    E --> E1[weight_int, weight_flt, rank_int, rank_flt, available_int, pagerank_fea, tag_feas, top_int, position_int]
    F --> F1[from_entity_kwd, to_entity_kwd, entity_kwd, entity_type_kwd, source_id, n_hop_with_weight, removed_kwd, doc_type_kwd]

    style B fill:#f9f,stroke:#333,stroke-width:1px
    style C fill:#bbf,stroke:#333,stroke-width:1px
    style D fill:#bfb,stroke:#333,stroke-width:1px
    style E fill:#fbb,stroke:#333,stroke-width:1px
    style F fill:#ffb,stroke:#333,stroke-width:1px

Summary

This JSON mapping file is essential for any system component that indexes and queries knowledge documents, ensuring data consistency, search efficiency, and semantic richness.