Get Dataset

This task uses short polling to get the dataset from Apify. If this task receives an empty dataset, it will retry with exponential back-off until the dataset becomes available or the timeout limit is reached. By default, the task will time out after 300 seconds to prevent it from hanging. When this task receives a empty dataset it is typically because the actor run has not finished uploading the Dataset.

yaml
type: "io.kestra.plugin.apify.dataset.get"

Get dataset with a given id.

yaml
id: apify_get_dataset_flow_required_properties
namespace: company.team

tasks:
  - id: list_runs
    type: io.kestra.plugin.apify.dataset.Get
    apiToken: "{{ secret('APIFY_API_TOKEN') }}"
    datasetId: mecGriFjtDHRNtYOZ

Get dataset with a given id and specific options.

yaml
id: apify_get_dataset_flow
namespace: company.team

tasks:
  - id: list_runs
    type: io.kestra.plugin.apify.dataset.Get
    apiToken: "{{ secret('APIFY_API_TOKEN') }}"
    datasetId: RNtYOZmecGriFjtDH
    clean: false
    offset: 1
    limit: 10
    fields: userId, #id, #createdAt, postMeta
    omit: #id
    flatten: postMeta
    sort: ASC
    skipEmpty: false
Properties

Apify API token

Api Token for Apify. You can find it in your Apify account settings.

datasetId

Default true

Clean

If true then the task returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The default value is true.

SubType string

Fields

List of fields which should be picked from the returned items, only these fields will remain in the resulting record objects.

Default false

Flatten

List of fields which should transform nested objects into flat structures. For example, with flatten="foo" the object {"foo": {"bar": "hello"}} is turned into {"foo.bar": "hello"}.

Default 1000

Limit

Maximum number of items to return. By default Limit value is set to 1000.

Default 0

Offset

Number of items that should be skipped at the start. The default value is 0.

SubType string

Omit

List of fields which should be omitted from the returned items.

The HTTP client configuration.

Default false

Simplified

If true then hidden fields are skipped from the output, i.e. fields starting with the # character.

Default true

SkipEmpty

If true then empty items are skipped from the output. Default value is true.

Default false

SkipFailedPages

If true then, the all the items with errorInfo property will be skipped from the output. Default value false.

Default false

SkipHidden

If true then hidden fields are skipped from the output, i.e. fields starting with the # character.

Default ASC
Possible Values
ASCDESC

sort

Sort the runs by startedAt in descending order. Defaults to ASC.

SubType string

Unwind

A list of fields which should be unwound, in order which they should be processed. Each field should be either an array or an object. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.

View

Defines the view configuration for dataset items based on the schema definition. This parameter determines how the data will be filtered and presented. For complete specification details, see the dataset schema documentation in the Apify documentation.

Format duration

The time allowed to establish a connection to the server before failing.

Default PT5M
Format duration

The time allowed for a read connection to remain idle before closing it.

The password for HTTP basic authentication.

The username for HTTP basic authentication.

Default false

If true, allow a failed response code (response code >= 400)

SubType integer

List of response code allowed for this request

The authentification to use.

Default UTF-8

The default charset for the request.

Default true

Whether redirects should be followed automatically.

SubType string
Possible Values
REQUEST_HEADERSREQUEST_BODYRESPONSE_HEADERSRESPONSE_BODY

The enabled log.

The proxy configuration.

The SSL request options

The timeout configuration.

The address of the proxy server.

The password for proxy authentication.

The port of the proxy server.

Default DIRECT
Possible Values
DIRECTHTTPSOCKS

The type of proxy to use.

The username for proxy authentication.

Whether to disable checking of the remote SSL certificate.

Only applies if no trust store is configured. Note: This makes the SSL connection insecure and should only be used for testing. If you are using a self-signed certificate, set up a trust store instead.

The token for bearer token authentication.