[Backport 8.x] Add streaming support to Arrow helper (#2429)

Co-authored-by: Josh Mock <joshua.mock@elastic.co>
2024-11-04 15:48:30 -06:00
parent 0e98719d60
commit dd9b38b051
7 changed files with 328 additions and 102 deletions
--- a/docs/helpers.asciidoc
+++ b/docs/helpers.asciidoc
@ -1,10 +1,10 @@
 [[client-helpers]]
 == Client helpers

-The client comes with an handy collection of helpers to give you a more 
+The client comes with an handy collection of helpers to give you a more
 comfortable experience with some APIs.

-CAUTION: The client helpers are experimental, and the API may change in the next 
+CAUTION: The client helpers are experimental, and the API may change in the next
 minor releases. The helpers will not work in any Node.js version lower than 10.


@ -14,7 +14,7 @@ minor releases. The helpers will not work in any Node.js version lower than 10.

 ~Added~ ~in~ ~`v7.7.0`~

-Running bulk requests can be complex due to the shape of the API, this helper 
+Running bulk requests can be complex due to the shape of the API, this helper
 aims to provide a nicer developer experience around the Bulk API.


@ -52,7 +52,7 @@ console.log(result)
 // }
 ----

-To create a new instance of the Bulk helper, access it as shown in the example 
+To create a new instance of the Bulk helper, access it as shown in the example
 above, the configuration options are:
 [cols=2*]
 |===
@ -83,7 +83,7 @@ const b = client.helpers.bulk({
    return {
      index: { _index: 'my-index' }
    }
-  } 
+  }
 })
 ----

@ -94,7 +94,7 @@ a|A function that is called for everytime a document can't be indexed and it has
 const b = client.helpers.bulk({
  onDrop (doc) {
    console.log(doc)
-  } 
+  }
 })
 ----

@ -105,7 +105,7 @@ a|A function that is called for each successful operation in the bulk request, w
 const b = client.helpers.bulk({
  onSuccess ({ result, document }) {
    console.log(`SUCCESS: Document ${result.index._id} indexed to ${result.index._index}`)
-  } 
+  }
 })
 ----

@ -249,11 +249,11 @@ client.helpers.bulk({
 [discrete]
 ==== Abort a bulk operation

-If needed, you can abort a bulk operation at any time. The bulk helper returns a 
+If needed, you can abort a bulk operation at any time. The bulk helper returns a
 https://promisesaplus.com/[thenable], which has an `abort` method.

-NOTE: The abort method stops the execution of the bulk operation, but if you 
-are using a concurrency higher than one, the operations that are already running 
+NOTE: The abort method stops the execution of the bulk operation, but if you
+are using a concurrency higher than one, the operations that are already running
 will not be stopped.

 [source,js]
@ -275,7 +275,7 @@ const b = client.helpers.bulk({
  },
  onDrop (doc) {
    b.abort()
-  } 
+  }
 })

 console.log(await b)
@ -285,8 +285,8 @@ console.log(await b)
 [discrete]
 ==== Passing custom options to the Bulk API

-You can pass any option supported by the link: 
-{ref}/docs-bulk.html#docs-bulk-api-query-params[Bulk API] to the helper, and the 
+You can pass any option supported by the link:
+{ref}/docs-bulk.html#docs-bulk-api-query-params[Bulk API] to the helper, and the
 helper uses those options in conjunction with the Bulk API call.

 [source,js]
@ -371,10 +371,10 @@ console.log(result)

 ~Added~ ~in~ ~`v7.8.0`~

-If you send search request at a high rate, this helper might be useful 
-for you. It uses the multi search API under the hood to batch the requests 
-and improve the overall performances of your application. The `result` exposes a 
-`documents` property as well, which allows you to access directly the hits 
+If you send search request at a high rate, this helper might be useful
+for you. It uses the multi search API under the hood to batch the requests
+and improve the overall performances of your application. The `result` exposes a
+`documents` property as well, which allows you to access directly the hits
 sources.


@ -399,7 +399,7 @@ m.search(
  .catch(err => console.error(err))
 ----

-To create a new instance of the multi search (msearch) helper, you should access 
+To create a new instance of the multi search (msearch) helper, you should access
 it as shown in the example above, the configuration options are:
 [cols=2*]
 |===
@ -459,18 +459,18 @@ const m = client.helpers.msearch({
 [discrete]
 ==== Stopping the msearch helper

-If needed, you can stop an msearch processor at any time. The msearch helper 
+If needed, you can stop an msearch processor at any time. The msearch helper
 returns a https://promisesaplus.com/[thenable], which has an `stop` method.

-If you are creating multiple msearch helpers instances and using them for a 
-limitied period of time, remember to always use the `stop` method once you have 
+If you are creating multiple msearch helpers instances and using them for a
+limitied period of time, remember to always use the `stop` method once you have
 finished using them, otherwise your application will start leaking memory.

-The `stop` method accepts an optional error, that will be dispatched every 
+The `stop` method accepts an optional error, that will be dispatched every
 subsequent search request.

-NOTE: The stop method stops the execution of the msearch processor, but if 
-you are using a concurrency higher than one, the operations that are already 
+NOTE: The stop method stops the execution of the msearch processor, but if
+you are using a concurrency higher than one, the operations that are already
 running will not be stopped.

 [source,js]
@ -507,9 +507,9 @@ setImmediate(() => m.stop())

 ~Added~ ~in~ ~`v7.7.0`~

-A simple wrapper around the search API. Instead of returning the entire `result` 
-object it returns only the search documents source. For improving the 
-performances, this helper automatically adds `filter_path=hits.hits._source` to 
+A simple wrapper around the search API. Instead of returning the entire `result`
+object it returns only the search documents source. For improving the
+performances, this helper automatically adds `filter_path=hits.hits._source` to
 the query string.

 [source,js]
@ -535,10 +535,10 @@ for (const doc of documents) {

 ~Added~ ~in~ ~`v7.7.0`~

-This helpers offers a simple and intuitive way to use the scroll search API. 
-Once called, it returns an 
-https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function[async iterator] 
-which can be used in conjuction with a for-await...of. It handles automatically 
+This helpers offers a simple and intuitive way to use the scroll search API.
+Once called, it returns an
+https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function[async iterator]
+which can be used in conjuction with a for-await...of. It handles automatically
 the `429` error and uses the `maxRetries` option of the client.

 [source,js]
@ -576,7 +576,7 @@ for await (const result of scrollSearch) {
 [discrete]
 ==== Quickly getting the documents

-If you only need the documents from the result of a scroll search, you can 
+If you only need the documents from the result of a scroll search, you can
 access them via `result.documents`:

 [source,js]
@ -593,9 +593,9 @@ for await (const result of scrollSearch) {

 ~Added~ ~in~ ~`v7.7.0`~

-It works in the same way as the scroll search helper, but it returns only the 
-documents instead. Note, every loop cycle returns a single document, and you 
-can't use the `clear` method. For improving the performances, this helper 
+It works in the same way as the scroll search helper, but it returns only the
+documents instead. Note, every loop cycle returns a single document, and you
+can't use the `clear` method. For improving the performances, this helper
 automatically adds `filter_path=hits.hits._source` to the query string.

 [source,js]
@ -707,3 +707,42 @@ const result = await client.helpers
  .esql({ query: 'FROM sample_data | LIMIT 2' })
  .toRecords<EventLog>()
 ----
+
+[discrete]
+===== `toArrowReader`
+
+~Added~ ~in~ ~`v8.16.0`~
+
+ES|QL can return results in multiple binary formats, including https://arrow.apache.org/[Apache Arrow]'s streaming format. Because it is a very efficient format to read, it can be valuable for performing high-performance in-memory analytics. And, because the response is streamed as batches of records, it can be used to produce aggregations and other calculations on larger-than-memory data sets.
+
+`toArrowReader` returns a https://arrow.apache.org/docs/js/classes/Arrow_dom.RecordBatchReader.html[`RecordBatchStreamReader`].
+
+[source,ts]
+----
+const reader = await client.helpers
+  .esql({ query: 'FROM sample_data' })
+  .toArrowReader()
+
+// print each record as JSON
+for (const recordBatch of reader) {
+  for (const record of recordBatch) {
+    console.log(record.toJSON())
+  }
+}
+----
+
+[discrete]
+===== `toArrowTable`
+
+~Added~ ~in~ ~`v8.16.0`~
+
+If you would like to pull the entire data set in Arrow format but without streaming, you can use the `toArrowTable` helper to get a https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html[Table] back instead.
+
+[source,ts]
+----
+const table = await client.helpers
+  .esql({ query: 'FROM sample_data' })
+  .toArrowTable()
+
+console.log(table.toArray())
+----