[Backport 8.x] Add streaming support to Arrow helper (#2429)

Co-authored-by: Josh Mock <joshua.mock@elastic.co>
This commit is contained in:
github-actions[bot]
2024-11-04 15:48:30 -06:00
committed by GitHub
parent 0e98719d60
commit dd9b38b051
7 changed files with 328 additions and 102 deletions

View File

@ -1,10 +1,10 @@
[[client-helpers]]
== Client helpers
The client comes with an handy collection of helpers to give you a more
The client comes with an handy collection of helpers to give you a more
comfortable experience with some APIs.
CAUTION: The client helpers are experimental, and the API may change in the next
CAUTION: The client helpers are experimental, and the API may change in the next
minor releases. The helpers will not work in any Node.js version lower than 10.
@ -14,7 +14,7 @@ minor releases. The helpers will not work in any Node.js version lower than 10.
~Added~ ~in~ ~`v7.7.0`~
Running bulk requests can be complex due to the shape of the API, this helper
Running bulk requests can be complex due to the shape of the API, this helper
aims to provide a nicer developer experience around the Bulk API.
@ -52,7 +52,7 @@ console.log(result)
// }
----
To create a new instance of the Bulk helper, access it as shown in the example
To create a new instance of the Bulk helper, access it as shown in the example
above, the configuration options are:
[cols=2*]
|===
@ -83,7 +83,7 @@ const b = client.helpers.bulk({
return {
index: { _index: 'my-index' }
}
}
}
})
----
@ -94,7 +94,7 @@ a|A function that is called for everytime a document can't be indexed and it has
const b = client.helpers.bulk({
onDrop (doc) {
console.log(doc)
}
}
})
----
@ -105,7 +105,7 @@ a|A function that is called for each successful operation in the bulk request, w
const b = client.helpers.bulk({
onSuccess ({ result, document }) {
console.log(`SUCCESS: Document ${result.index._id} indexed to ${result.index._index}`)
}
}
})
----
@ -249,11 +249,11 @@ client.helpers.bulk({
[discrete]
==== Abort a bulk operation
If needed, you can abort a bulk operation at any time. The bulk helper returns a
If needed, you can abort a bulk operation at any time. The bulk helper returns a
https://promisesaplus.com/[thenable], which has an `abort` method.
NOTE: The abort method stops the execution of the bulk operation, but if you
are using a concurrency higher than one, the operations that are already running
NOTE: The abort method stops the execution of the bulk operation, but if you
are using a concurrency higher than one, the operations that are already running
will not be stopped.
[source,js]
@ -275,7 +275,7 @@ const b = client.helpers.bulk({
},
onDrop (doc) {
b.abort()
}
}
})
console.log(await b)
@ -285,8 +285,8 @@ console.log(await b)
[discrete]
==== Passing custom options to the Bulk API
You can pass any option supported by the link:
{ref}/docs-bulk.html#docs-bulk-api-query-params[Bulk API] to the helper, and the
You can pass any option supported by the link:
{ref}/docs-bulk.html#docs-bulk-api-query-params[Bulk API] to the helper, and the
helper uses those options in conjunction with the Bulk API call.
[source,js]
@ -371,10 +371,10 @@ console.log(result)
~Added~ ~in~ ~`v7.8.0`~
If you send search request at a high rate, this helper might be useful
for you. It uses the multi search API under the hood to batch the requests
and improve the overall performances of your application. The `result` exposes a
`documents` property as well, which allows you to access directly the hits
If you send search request at a high rate, this helper might be useful
for you. It uses the multi search API under the hood to batch the requests
and improve the overall performances of your application. The `result` exposes a
`documents` property as well, which allows you to access directly the hits
sources.
@ -399,7 +399,7 @@ m.search(
.catch(err => console.error(err))
----
To create a new instance of the multi search (msearch) helper, you should access
To create a new instance of the multi search (msearch) helper, you should access
it as shown in the example above, the configuration options are:
[cols=2*]
|===
@ -459,18 +459,18 @@ const m = client.helpers.msearch({
[discrete]
==== Stopping the msearch helper
If needed, you can stop an msearch processor at any time. The msearch helper
If needed, you can stop an msearch processor at any time. The msearch helper
returns a https://promisesaplus.com/[thenable], which has an `stop` method.
If you are creating multiple msearch helpers instances and using them for a
limitied period of time, remember to always use the `stop` method once you have
If you are creating multiple msearch helpers instances and using them for a
limitied period of time, remember to always use the `stop` method once you have
finished using them, otherwise your application will start leaking memory.
The `stop` method accepts an optional error, that will be dispatched every
The `stop` method accepts an optional error, that will be dispatched every
subsequent search request.
NOTE: The stop method stops the execution of the msearch processor, but if
you are using a concurrency higher than one, the operations that are already
NOTE: The stop method stops the execution of the msearch processor, but if
you are using a concurrency higher than one, the operations that are already
running will not be stopped.
[source,js]
@ -507,9 +507,9 @@ setImmediate(() => m.stop())
~Added~ ~in~ ~`v7.7.0`~
A simple wrapper around the search API. Instead of returning the entire `result`
object it returns only the search documents source. For improving the
performances, this helper automatically adds `filter_path=hits.hits._source` to
A simple wrapper around the search API. Instead of returning the entire `result`
object it returns only the search documents source. For improving the
performances, this helper automatically adds `filter_path=hits.hits._source` to
the query string.
[source,js]
@ -535,10 +535,10 @@ for (const doc of documents) {
~Added~ ~in~ ~`v7.7.0`~
This helpers offers a simple and intuitive way to use the scroll search API.
Once called, it returns an
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function[async iterator]
which can be used in conjuction with a for-await...of. It handles automatically
This helpers offers a simple and intuitive way to use the scroll search API.
Once called, it returns an
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function[async iterator]
which can be used in conjuction with a for-await...of. It handles automatically
the `429` error and uses the `maxRetries` option of the client.
[source,js]
@ -576,7 +576,7 @@ for await (const result of scrollSearch) {
[discrete]
==== Quickly getting the documents
If you only need the documents from the result of a scroll search, you can
If you only need the documents from the result of a scroll search, you can
access them via `result.documents`:
[source,js]
@ -593,9 +593,9 @@ for await (const result of scrollSearch) {
~Added~ ~in~ ~`v7.7.0`~
It works in the same way as the scroll search helper, but it returns only the
documents instead. Note, every loop cycle returns a single document, and you
can't use the `clear` method. For improving the performances, this helper
It works in the same way as the scroll search helper, but it returns only the
documents instead. Note, every loop cycle returns a single document, and you
can't use the `clear` method. For improving the performances, this helper
automatically adds `filter_path=hits.hits._source` to the query string.
[source,js]
@ -707,3 +707,42 @@ const result = await client.helpers
.esql({ query: 'FROM sample_data | LIMIT 2' })
.toRecords<EventLog>()
----
[discrete]
===== `toArrowReader`
~Added~ ~in~ ~`v8.16.0`~
ES|QL can return results in multiple binary formats, including https://arrow.apache.org/[Apache Arrow]'s streaming format. Because it is a very efficient format to read, it can be valuable for performing high-performance in-memory analytics. And, because the response is streamed as batches of records, it can be used to produce aggregations and other calculations on larger-than-memory data sets.
`toArrowReader` returns a https://arrow.apache.org/docs/js/classes/Arrow_dom.RecordBatchReader.html[`RecordBatchStreamReader`].
[source,ts]
----
const reader = await client.helpers
.esql({ query: 'FROM sample_data' })
.toArrowReader()
// print each record as JSON
for (const recordBatch of reader) {
for (const record of recordBatch) {
console.log(record.toJSON())
}
}
----
[discrete]
===== `toArrowTable`
~Added~ ~in~ ~`v8.16.0`~
If you would like to pull the entire data set in Arrow format but without streaming, you can use the `toArrowTable` helper to get a https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html[Table] back instead.
[source,ts]
----
const table = await client.helpers
.esql({ query: 'FROM sample_data' })
.toArrowTable()
console.log(table.toArray())
----