Searchkit API – Searchkit 中文

安装

它在 npm 上可用作 searchkit。

npm install searchkit

然后将其导入您的项目

import Searchkit from "searchkit";
// OR if you are using CDN
const Searchkit = window.Searchkit;

使用

您可以使用 Searchkit 和 @searchkit/api 或 @searchkit/instantsearch-client 包。

 
import Client from '@searchkit/instantsearch-client'
import Searchkit, { SearchkitConfig } from "searchkit"
 
const searchkitClient = new Searchkit({
  connection: {
    host: "https://commerce-demo.es.us-east4.gcp.elastic-cloud.com:9243",
    // cloud_id: "my-cloud-id" if using Elastic Cloud
 
    apiKey: "a2Rha1VJTUJMcGU4ajA3Tm9fZ0Y6MjAzX2pLbURTXy1hNm9SUGZGRlhJdw==", // optional apiKey
    headers: { // optional headers sent to Elasticsearch or elasticsearch proxy
      "my-custom-header": "my-custom-value"
    },
    auth: {
      username: 'elastic',
      password: 'changeme'
    }
  },
  search_settings: {
    highlight_attributes: ["title", "actors"],
    search_attributes: ["title", "actors"],
    result_attributes: ["title", "actors", "poster", "year"],
    facet_attributes: [
      { attribute: "type", field: "type", type: "string" },
      { attribute: "actors", field: "actors.keyword", type: "string" },
      { attribute: "rated", field: "rated", type: "string" },
      { attribute: "imdbrating", type: "numeric" },
      { attribute: "metascore", type: "numeric" },
    ]
  },
})
 
const searchClient = Client(searchkitClient);
 
export const App = () => {
  return (
    <InstantSearch searchClient={searchClient} indexName="movies">
      <SearchBox />
      <Hits />
    </InstantSearch>
  )
}

SearchkitConfig

`connection` 配置

host - Elasticsearch URL 主机
cloud_id - Elasticsearch 云 ID。这是可选的，但建议用于连接到云上的 Elasticsearch。您可以在 Elastic Cloud 控制台（在新标签页中打开）中找到您的云 ID。
apiKey - Elasticsearch API 密钥。这是可选的，但强烈建议在生产环境中使用。您可以通过 Kibana （在新标签页中打开）创建 API 密钥。
headers - 要传递给 Elasticsearch 的其他标头。这是可选的。
withCredentials - 是否将凭据与请求一起发送。这是可选的。对 CORS 请求很有用。默认为 false。
auth - 基本身份验证凭据。这是可选的。

有关设置 Elasticsearch 的更多信息，请参阅连接到 Elasticsearch。

自定义传输器

连接还支持实现您自己的 Elasticsearch 网络传输器。您可以通过 connection 传递传输器实例来实现此操作。

您可以将其用于与 Elasticsearch 或 Opensearch 进行更复杂的身份验证连接。

 
import { ESTransporter } from 'searchkit'
import type { SearchRequest } from "searchkit"
 
class MyTransporter extends ESTransporter {
  async performNetworkRequest(requests: SearchRequest[]) {
    // you can use any http client here
    return fetch(`https://localhost:9200/_msearch`, {
      headers: {
        // Add custom headers here
      },
      body: this.createElasticsearchQueryFromRequest(requests),
      method: 'POST'
    })
  }
}
 
// then pass the custom transporter to the client
const client = Client({
  connection: new MyTransporter()
});

`search_settings` 配置

search_settings: {
  search_attributes: ["title^3", "actors"],
  result_attributes: ["title", "actors", "poster", "year", "rating"],
  facet_attributes: [
    { attribute: "type", field: "type", type: "string" },
    { attribute: "actors", field: "actors.keyword", type: "string" },
    { attribute: "rated", field: "rated", type: "string" },
    { attribute: "imdbrating", type: "numeric" },
    { attribute: "metascore", type: "numeric" },
  ],
  filter_attributes: [
    { attribute: "year", field: 'year', type: "numeric" }
  ],
  highlight_attributes: ["title", "actors"],
  snippet_attributes: [
    "description",
    "plot:200"
  ],
  sorting: {
    default: {
      field: '_score',
      order: 'desc'
    },
    _year_desc: [{
      field: 'year',
      order: 'desc'
    }]
  },
  geo_attribute: "location",
  runtime_mappings: {
    rating: {
      type: 'keyword',
      script: {
        source: "emit(doc['rated'].size()>0 ? doc['rated'].value : '')"
      }
    }
  },
},

用于配置搜索体验的属性。

search_attributes - 用于搜索结果的属性。
result_attributes - 在搜索结果响应中返回的属性。
facet_attributes - 用于创建分面的属性。分面可以是 string 或 numeric 或 date 类型。
filter_attributes - 用于创建过滤器的属性。过滤器可以是 string 类型、numeric 类型或 date 类型。
highlight_attributes - 用于突出显示搜索结果的属性。
snippet_attributes - 用于突出显示长字段搜索结果的属性。
sorting - 用于创建排序选项的属性。排序可以是单个排序字段或多个字段。
query_rules - 影响搜索相关性的规则。有关更多信息，请参阅查询规则。
geo_attribute - 用于基于地理位置的搜索的属性。
runtime_mappings - 用于转换搜索结果中字段的运行时映射。有关更多信息，请参阅运行时映射。使用后，您可以在 search_attributes、result_attributes、facet_attributes 和 filter_attributes 配置中使用转换后的字段。

search_attributes

搜索属性定义当用户执行搜索时应搜索哪些 Elasticsearch 字段。

搜索属性可以按如下方式配置

search_attributes: [
  "description", 
  "actors", 
  { field: "title", weight: 3 }, 
  "released.year"
];

以下配置将搜索 description、actors 和对象字段 released.year 字段，默认权重为 1。 title 字段的权重将是 actors 字段的 3 倍。

`facet_attributes`

对于基于文本的 facets，您需要指定一个 keyword 类型字段用于 facet。这是因为 Elasticsearch 不支持对文本字段进行聚合。您可以在此处 (在新标签页中打开) 找到有关字段映射的更多信息。

通常，您会为文本字段创建一个具有 keyword 类型的子字段。例如，如果您有一个 actors 字段，其类型为 text，则会创建一个 actors.keyword 字段，其类型为 keyword。

{
  "mappings": {
    "properties": {
      "actors": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

在上面的示例中，您将按如下方式指定 actors facet

facet_attributes: [
  { attribute: "actors", field: "actors.keyword", type: 'string' },
],

并且您将在 UI 组件中使用 actors 属性。

下面是一个使用 actors facet 的 RefinementList Instantsearch React 组件示例。

<RefinementList attribute="actors" searchable={true} limit={10} />

有关更多信息，请参阅 facets 指南。

自定义聚合和过滤器查询

Searchkit 使用 terms 聚合来生成 facets，并使用 term 子句应用过滤器。您可以使用 facetQuery 和 filterQuery 选项来指定自定义聚合查询。

facet_attributes: [
  {
    field: 'actors.keyword',
    attribute: 'actors',
    type: 'string',
    // Custom aggregation query
    facetQuery: () => ({
      filters: {
        filters: {
          movie: {
            term: {
              type: 'movie'
            }
          },
          episode: {
            term: {
              type: 'episode'
            }
          }
        }
      }
    }),
    // handle the aggregation response and return an object with the facet names and values.
    // here we are hardcoding the values to 100 for each filter bucket
    facetResponse: (aggregation: AggregationsFiltersAggregate) => {
      const buckets = aggregation.buckets as AggregationsFiltersBucket[]
      return Object.keys(buckets).reduce(
        (sum, bucket) => ({
          ...sum,
          [bucket]: 100
        }),
        {}
      )
    },
    // When a user selects a facet, this function is called to generate the filter query
    filterQuery: (field: string, value: string) => {
      return { match: { ['type.keyword']: value } }
    }
  }
],

数字和日期类型

对于基于范围的 facets，您需要将 type 指定为 numeric 或 date。这样客户端才能正确地生成用于 UI 的字段 facet 统计信息。

facet_attributes: [
  { attribute: "imdbrating", field: "imdbrating", type: "numeric" },
  { attribute: "metascore", field: "metascore", type: "numeric" },
],

下面是一个使用 imdbrating facet 的 NumericMenu Instantsearch React 组件示例。

<NumericMenu
  attribute="imdbrating"
  items={[
    { label: "5 - 7", start: 5, end: 7 },
    { label: "7 - 9", start: 7, end: 9 },
    { label: ">= 9", start: 9 },
  ]}
/>

`filter_attributes`

与 facet_attributes 类似，filter_attributes 用于应用过滤器。区别在于 filter_attributes 不用于生成 facets。

filter_attributes: [
  { attribute: "writers", field: 'writers', type: "string" }
],

自定义过滤器查询

Searchkit 使用 term 子句来应用过滤器。您可以使用 filterQuery 选项来指定自定义过滤器查询。

filter_attributes: [
  {
    field: 'actors.keyword',
    attribute: 'actors',
    type: 'string',
    filterQuery: (
      field, // 'actors.keyword' in this case
      value // value specified by the frontend
    ) => ({
      match: { // example of using match clause instead of terms filter
        [field]: value
      }
    }
    )
  }
],

`sorting`

排序可以按如下方式配置

 
sorting: {
  default: {
    field: '_score',
    order: 'desc'
  },
  _year_desc: [{
    field: 'year',
    order: 'desc'
  }]
}

当未选择任何排序选项时，将使用 default 排序选项。当用户选择 _year_desc 排序选项时，将使用 _year_desc 排序选项。

排序选项可以是单个排序字段或多个排序字段。

有关更多信息，请参阅排序指南。

`highlight_attributes`

highlight_fields 用于配置搜索结果中突出显示的字段。highlight_fields 必须指向 Elasticsearch 中的 text 字段类型。

highlight_attributes: ["title"]

与 Highlight 组件一起使用，以显示突出显示的字段。

`snippet_attributes`

snippet_attributes 用于配置搜索结果中突出显示的属性。snippet_attributes 必须指向 Elasticsearch 中的 text 字段类型。

`snippet_attributes: [
  'long_bio',
  'description:200'
]

`geo_attribute`

geo_attribute 用于配置基于地理位置的搜索体验。geo_attribute 必须指向 Elasticsearch 中的 geo_point 或 geo_shape 字段类型。

geo_attribute: "location"

有关更多信息，请参阅地理位置搜索。

请求选项

RequestOptions 是一个包含以下属性的对象

getQuery - 用于覆盖默认 Elasticsearch 查询的函数。
getBaseFilters - 用于提供基本 Elasticsearch 过滤器的函数。这些过滤器应用于所有搜索请求。
hooks - 包含以下属性的 hooks 对象
- beforeSearch - 在执行搜索请求之前调用的函数。
- afterSearch - 在收到搜索响应后调用的函数。

`getQuery` 可选函数

getQuery 函数用于覆盖默认的 Elasticsearch 查询。该函数必须返回一个 Elasticsearch 查询。您可以此处 (在新标签页中打开)了解更多关于 Elasticsearch 查询 DSL 的信息。

以下是一个 getQuery 函数的示例，它覆盖了默认查询以使用 combined_fields 查询类型（此处 (在新标签页中打开)了解更多关于 combined_fields 的信息）。

要查看执行到 Elasticsearch 的完整 Elasticsearch 查询，您可以以调试模式运行客户端（见下文）。

  const results = await client.handleRequest(req.body, {
    getQuery: (query, search_attributes) => {
      return [
        {
          combined_fields: {
            query,
            fields: search_attributes,
          },
        },
      ];
    }
  });

示例：排除 BM25 查询

如果您仅使用 KNN，则可能希望从 Elasticsearch 查询中排除 BM25 查询。您可以通过覆盖 getQuery 函数并返回 false 来实现。

  const results = await client.handleRequest(req.body, {
    getQuery: () => {
      return false
    }
  });

`getKnnQuery` 可选函数

如果您想指定一个 KNN 查询，您可以使用 getKnnQuery 函数。该函数必须返回一个 Elasticsearch KNN 查询。您可以此处 (在新标签页中打开)了解更多关于 KNN 查询 DSL 的信息。

  const results = await client.handleRequest(req.body, {
    getKnnQuery(query, search_attributes, config) {
      return {
        field: 'dense-vector-field',
        k: 10,
        num_candidates: 100,
        query_vector_builder: {
          text_embedding: {
            model_id: 'cookie_model',
            model_text: query
          }
        }
      }
    },
    // Optional: You may want to exclude the BM25 query
    getQuery: () => {
      return false
    }
  });

函数参数

query：来自搜索请求的查询字符串。
search_attributes：来自搜索配置的搜索属性。

`getBaseFilters` 可选函数

getBaseFilters 函数用于向 Elasticsearch 查询添加过滤器。该函数必须返回一个 Elasticsearch 查询。您可以此处 (在新标签页中打开)了解更多关于 Elasticsearch 查询 DSL 的信息。

如果请求需要根据用户的会话进行过滤，此函数很有用。例如，如果您想根据用户的角色或状态过滤搜索结果。

以下是一个 getBaseFilters 函数的示例，它向 Elasticsearch 查询添加了一个过滤器，以仅返回 status 字段为 published 的结果。

要查看执行到 Elasticsearch 的完整 Elasticsearch 查询，您可以以调试模式运行客户端（见下文）。

  const results = await client.handleRequest(req.body, {
    getBaseFilters: () => {
      return [
        {
          bool: {
            must: {
              term: {
                status: {
                  value: "published",
                },
              },
            },
          },
        },
      ];
    }
  });

RequestOptions Hooks

Hooks 是在搜索请求的不同阶段调用的函数。如果您想在执行搜索请求之前或之后执行某些操作，Hooks 非常有用。

`beforeSearch` hook 函数

beforeSearch hook 在执行搜索请求之前调用。如果您想在执行搜索请求之前执行某些操作，此 hook 非常有用。

例如：

学习排序
语义搜索
A/B 测试

以下是一个 beforeSearch hook 的示例，它向 Elasticsearch 查询添加了一个 track_total_hits。

  client = SearchkitInstantsearchClient(sk, {
    hooks: {
        beforeSearch: async (searchRequests) => {
 
            return searchRequests.map((sr) => {
                return {
                  ...sr,
                  body: {
                    ...sr.body,
                    track_total_hits: true
                 }
              }
           })
 
        }
    }
})

要查看执行到 Elasticsearch 的完整 Elasticsearch 查询，您可以以调试模式运行客户端（见下文）。

函数参数

searchRequests - 一个 SearchRequest 对象数组。每个 SearchRequest 对象包含以下属性
- indexName - Elasticsearch 索引名称。
- body - Elasticsearch 请求体查询。
- request - 来自 UI 的状态请求。包含查询、过滤器、排序、大小等属性。

`afterSearch` hook 函数

afterSearch hook 在收到搜索响应后调用。如果您想在收到搜索响应后执行某些操作，此 hook 非常有用。

例如：

日志记录
分析

以下是一个 afterSearch hook 的示例，它将搜索响应记录到控制台。

  const results = await client.handleRequest(req.body, {
    hooks: {
      afterSearch: (searchRequests, searchResponses) => {
        console.log(searchResponses);
        return searchResponses;
      },
    },
  });

调试模式

客户端可以以调试模式运行，以帮助调试 Elasticsearch 查询。要以调试模式运行客户端，请在 Client 函数中将 debug 标志设置为 true。

当您通过 getQuery 覆盖查询或通过 getBaseFilters 提供基本过滤器，并希望查看执行到 Elasticsearch 的 Elasticsearch 查询时，这将很有帮助。

const client = Client({
  // search_settings configuration
  connection: {
    // ...
  },
  search_settings: {
    search_attributes: ["title", "plot"],
    // ...
  }
}, { debug: true });

当客户端在调试模式下运行时，Elasticsearch 查询将记录到控制台。

元字段

分数：在搜索结果命中中作为 _score 访问。这是文档的相关性分数。索引：在搜索结果命中中作为 _index 访问。这是 Elasticsearch 索引名称。

常见问题

如何跨多个索引搜索？

您可以通过在 indexName 参数中指定索引名称作为逗号分隔的值来跨多个索引搜索。

您还可以创建一个指向多个索引的 Elasticsearch 别名，并在 indexName 参数中使用别名。

export const App = () => {
  return (
    <InstantSearch searchClient={searchClient} indexName="movies,episodes,series">
      <SearchBox />
      <Hits />
    </InstantSearch>
  )
}

渲染数据节点 API

安装

使用

SearchkitConfig

connection 配置

自定义传输器

search_settings 配置

search_attributes

facet_attributes

自定义聚合和过滤器查询

数字和日期类型

filter_attributes

自定义过滤器查询

sorting

highlight_attributes

snippet_attributes

geo_attribute

请求选项

getQuery 可选函数

示例：排除 BM25 查询

getKnnQuery 可选函数

函数参数

getBaseFilters 可选函数

RequestOptions Hooks

beforeSearch hook 函数

函数参数

afterSearch hook 函数

调试模式

元字段

常见问题

如何跨多个索引搜索？

`connection` 配置

`search_settings` 配置

`facet_attributes`

`filter_attributes`

`sorting`

`highlight_attributes`

`snippet_attributes`

`geo_attribute`

`getQuery` 可选函数

`getKnnQuery` 可选函数

`getBaseFilters` 可选函数

`beforeSearch` hook 函数

`afterSearch` hook 函数