Agent Command
The Agent command is the first command that executes in an agent; all other commands are sub-commands. So, only one Agent command can exist in an agent. The Agent command loads the start URL, which is the first point of data extraction, and also contains all common agent properties (including data export configuration).
The Agent command uses a data provider that provides one or more start URLs, and the command will execute once for each of these URLs.
The figure below shows the Configure Agent Command in Sequentum Enterprise Editor in which the Agent command uses a simple data provider to load a single static start URL.
NOTE: The Agent command derives from the Navigate URL command, which loads one or more URLs.
Command Configuration
The configuration screen for the Agent command has four tabs: Common, Action, Data, and Properties. See the Sequentum Enterprise Command Reference topic to learn about the Properties of the Agent command. In the Common tab, you can edit the command name and (optionally) customize the data provider properties.
CSV Data: If you leave a check in the Use Default Input box, the command will provide simple CSV data and use that data as input. Simple CSV data consists of values that you enter directly into the command, so no external CSV is necessary.
You can uncheck the Use Default Input box and choose the Data Provider that will provide the start URLs. The default data provider is a simple data provider that provides a list of static URLs. You can populate the data provider directly by entering the start URLs in the URLs input box.
Use the Action tab to control how the web browser loads the start URLs. See Action Configuration for more information.
Use the Data tab to set the data provider that provides the start URLs. Read more in Using Data Input.
Configuration of an Action Command
Explore the options and properties that you can configure for a command by taking these simple steps:
Clicking once on a web element in the browser panel.
Locate the New Command drop-down in the Configure Agent Command.
Choose a command from the drop-down.
Explore the tabs: Common, Action, Data, and Properties.
In the Common tab, uncheck the Use default input box to reveal more options.
In the Action tab, uncheck the Discover Action box to reveal more options.
In many web-scraping scenarios, the default functionality will be quite sufficient. Should you find the need for more flexibility and control, you can learn how to configure all aspects and properties in the Action Configuration section.
Action Configuration
By default, all action commands are set to Discover action settings (the Discover action checkbox on the Action tab is checked). The action settings will automatically be configured when you execute the command the first time. The default action settings are usually quite sufficient, so you will rarely need to worry about the Action configuration tab at all. However, you can fine-tune the action settings to achieve better performance, and you may need to adjust the action settings to get the command working correctly.
After choosing a New Command type from the drop-down in the Configure Agent Command panel, click the Action tab.
By default, all-action commands are set to automatically discover action settings when you execute the command for the first time. The settings are usually suitable for most scenarios, so you will rarely need to worry about the Action configuration tab at all. After some experience, you may want to fine-tune these settings to achieve better performance, and sometimes it may be necessary to adjust the action settings to get the command to function according to your precise requirements.
There are several configuration tabs that are available for Action commands in the Configure Agent Command panel, including:
Action - Specifies the type of action, which can be a Fire Event, URL, or No Action. Not all action types are available for all action commands.
Browser - Specifies the web browser in which the new content should load.
Events - For action set to Fire Events only, this is a list of events that will fire on the chosen web element.
Wait - Specifies the browser activities for which the command will wait before the action is complete.
Action
Action commands can execute one of two types, or no action at all:
Fire Events - The command fires events, such as a mouse click, on the selected web element.
Load URL - The command loads a new page into the web browser using a direct URL.
No Action - The command does not execute an action. This is only relevant for form fields that may execute an action when an input value is assigned but often execute no action at all.
Wait Times
An action command uses Wait Time values to determine how long it should wait for activities, such as how long it should wait for a new page to start loading. If you decrease the Wait Time values, the agent will run faster, but it may not work correctly if the website is slow. If the website is very slow you may need to increase the Wait Time values to make sure the agent works correctly.
If the option Default wait times is checked, the command will use the same Wait Time values as the parent action command.
Scroll to End of Page
Some web pages load additional content when you scroll the page - either downward or to the right. To extract all content from such pages, you need to include an action that scrolls down to the end of the page, so all content is available to the agent.
When you set the option Scroll to the end of the page, you will be able to limit the number of times the command scrolls to the end of a page to load new content. This can be important since some pages will continue loading new content for a single page until Sequentum Enterprise finally runs out of memory.
Browser
An action command often loads new content. With the Browser action type, you can configure how content loads into a new web browser, the current web browser, or the parent web browser. You can also specify a different browser mode, as we explain below.
This Navigate Link command will open a web page in a new web browser
Uncheck the Discover Action box and then click the Browser tab to view these for the Target Browser:
New. The action command loads content into a new web browser, and all sub-commands will operate in the new browser. To use this option, the action command must load a completely new page. Asynchronous actions, such as AJAX calls, cannot load content into a new web browser.
Current. The action command loads content into the current web browser, and all sub-commands will operate in the new browser. Asynchronous actions, such as AJAX calls, can only load content into the current web browser.
Parent. Some older websites may require child browsers to load content into the parent browser in order to function correctly, so this option should only be chosen in such cases. If an action command loads content into a new web browser, that new web browser becomes a child browser of the current web browser, and actions in the child browser can direct content into the parent browser. To use this option, the action command must load a completely new page.
Browser Mode
If you leave the default for the Target Browser (New), you can choose the browser mode:
Default - The new web browser will be exactly the same type as the parent browser.
Web Browser- The browser functions as a standard web browser, and it will download images and execute JavaScript.
HTML Parser - the command does not start a new browser. Instead, the web page simply downloads and runs through a HTML parser. The HTML parser does not execute JavaScript and does not load frames, so it is faster and more reliable than a web browser. However, the parser does not work on websites that rely on JavaScript, and the parser may also be unable to submit some web forms (even when they don't rely on JavaScript).
JSON Parser - the command does not start a new browser. Instead, it parses JSON content returned by a web server and lets you easily extract content elements from the JSON content.
XML Parser - the command does not start a new browser. Instead, it parses XML content returned by a web server and lets you easily extract content elements from the XML content.
Note:
You must reopen the browser tab for any change in this setting to take effect.
Events
If a command action type is set to fire events, then you can specify the events that should be fired on the selected web element.
In most cases, you can check the Use default events checkbox, which will fire all appropriate events that the web element supports. In special cases, you may want to remove some of the default events. For example, an action may try to open a drop-down box for an input form field.
Firing the focus or click event on the input form field may cause the drop-down box to open, but the blur event may cause the drop-down box to close. In that case, firing all the default events would open the drop-down box but quickly close it again. To prevent this, uncheck the Use default events box and remove the blur event from the list.
Supported Events and Functions
Sequentum Enterprise supports all the default events for the chosen web element and some custom events and functions. The following list includes only the most common events and is not a complete list of all available events. Please see a JavaScript reference guide for all available events.
Event Name | Description |
---|---|
mousedown | Emulates the press of a mouse button onto the chosen web element without releasing the button. |
mouseup | Following a mousedown event, emulates releasing a mouse button onto the chosen web element. |
click | In immediate succession, emulates a press and a release of the left mouse button on the chosen web element. |
rightclick | In immediate succession, emulates a press and a release of the right mouse button on the chosen web element. |
keydown | Emulates pressing a key in relation to the chosen web element without releasing it. |
keyup | Emulates releasing a key in relation to the chosen web element. |
keypress | In immediate succession, emulates a press and a release of a key in relation to the chosen web element. |
focus | Emulates bringing input focus to the chosen web element. |
blur | Emulates removing input focus to the chosen web element. |
change | Emulates changing the input value of the chosen web element. |
contextmenu | Opens the context menu on the chosen web element. |
The following list includes custom functions that can be used along with the standard events:
Function | Description |
---|---|
exec(JavaScript) | Executes a JavaScript on the selected web element. Example 1: exec($(element).unbind('blur')) This example removes all blur events from the chosen web element. This example requires JQuery to be available on the web page, but the exec function works on non-JQuery JavaScript as well. Example 2: exec(element.click()) This example fires the click event on the selected web element. Example 3: exec(window.history.back()) This example moved the current page back to the previous page. The variable element is always defined as the selected web element. |
unbind(Event) | Removes all events of type Event from the selected web element. This function requires JQuery to be available on the web page. Example: unbind(blur) This example is equivalent to calling exec($(element).unbind('blur')) or exec($(element).off('blur')) |
click() | Fires the following 3 events using the left mouse button: mousedown mouseup click |
rightclick() | Fires the following 3 events using the right mouse button: mousedown mouseup click |
simulateclick() | This function simulates a left mouse click. The function is different from click, since it scrolls the selected web element into view and then simulates a real mouse click in the browser window.
|
simulaterightclick() | This function simulates a right mouse click. The function is different from rightclick, since it scrolls the selected web element into view and then simulates a real mouse click in the browser window. |
simulatemousemove() | This function scrolls the selected web element into view and then simulates a real mouse move in the browser window. |
simulatemousemoves() | This function scrolls the selected web element into view and then simulates a series of real mouse moves from the edge of the selected web element to a random location within the selected web element. |
delay(milliseconds) | Pauses execution of the command for a specific number of milliseconds. Example: delay(2000) This example inserts a delay of 2 seconds. Important note: The activity timeouts include any event delay. So, if you have a single activity that waits 500 milliseconds, and all events take longer than 500 milliseconds to fire, then the action will time out before all the events have had time to fire. |
removeCgAttributes() | Sequentum Enterprise adds custom attributes to DOM elements in order to keep track of these elements. Very rarely, this causes issues with the target website. This function simply removes these attributes before the action. |
setinputtext() setinputtext(text) | This function inserts text into the chosen form field and is only compatible with Form Field commands. If the form field is a select box, the function selects the option with the text attribute equal to the specified text. If this function call contains no text, the function inserts the input data for the Form Field command. Example: sendinputtext(hotels) Typically, a Form Field command sets the value of the chosen form field and then fires the specifies events. This function gives you the ability to fire events before setting the value of a form field. NOTE: Often, this function is used when the corresponding Form Field command property Set Value is set to false. |
simulateentertext() simulateentertext(text) | This function simulates entering text into the chosen form field and is only compatible with Form Field commands. The function is different from setinputtext, since it simulates entering text by focusing the form field and then sending key events to the browser. If this function call contains no text, the function inserts the input data for the Form Field command. Example: simulateentertext(hotels) Typically, a Form Field command sets the value of the chosen form field and then fires the specifies events. This function gives you the ability to fire events before setting the value of a form field. NOTE: Often, this function is used when the corresponding Form Field command property Set Value is set to false. |
keycode(keycode) | Fires keydown, keyup and keypress with the specified key code. |
simulatekeycode(keycode) | This function simulates entering a key into the chosen form field and is only compatible with Form Field commands. The function is different from keycode, since it simulates entering a key by focusing the form field and then sending the key event to the browser. Example: simulatekeycode(13) The above example emulates pressing the enter key. |
simulatebackspace(count) | This function simulates entering a specified number of backspace keys into the chosen form field and is only compatible with Form Field commands. This function can be used to clear existing text in a form field. Example: simulatebackspace(5) The above example simulates entering the backspace key 5 times. |
key(key) | Fires keydown, keyup and keypress with the specified key. The key can be a character or one of the following: enter paste left right up down If a key is a character, it can be preceded by one of the following: ctrl+ shift+ alt+ Examples: key(a) key(ctrl+a) key(paste) key(enter) if a string enclosed in double quotes is specified, the keydown, keyup and keypress events are fired for each character in the string. Example: key("hotel") |
scroll()
scroll(percentage) | This function scrolls downward on the page containing the chosen web element, to ensure adequate coverage for pages that load content dynamically. Often, the function call is used along with the option Repeat Link Action, which will repeat a downward scroll to the bottom-until the scroll no longer loads new content. Example 1: scroll() This example will scroll all the way to the bottom of the content. Optionally, you can specify the amount to scroll in terms of pixels. Example 2: scroll(50) This example will scroll 50 pixels down through the content. |
scrolls(scrollCount) | This function scrolls downward a specified number of times on the page containing the chosen web element, to ensure adequate coverage for pages that load content dynamically. |
windowscroll()
windowscroll (percentage) | This function scrolls down to the bottom of the page containing the chosen web element, to ensure adequate coverage for pages that load content dynamically. If the web browser contains multiple frames, you should choose a web element within the frame to which you want to scroll. The chosen web element has no other influence on this function. Often, the function call is used along with the option Repeat Link Action, which will repeat a downward scroll to the bottom - until the scroll no longer loads new content. The Scroll to End of Page action option combines the windowscroll event with the action option Repeat Link Action, so this option can be used as an alternative to the windowscroll function. Example 1: windowscroll() This example will scroll all the way to the bottom of the web page. Optionally, you can specify the amount to scroll in terms of pixels. Example 2: windowscroll(50) This example will scroll 50 pixels down the web page. |
windowscrolls(scrollCount) | This function scrolls down a specified number of times, to the bottom of the page containing the chosen web element, to ensure adequate coverage for pages that load content dynamically. |
Back() | Move back to the previous page by calling the JavaScript function window.history.back() |
If the action command is a Form Field, then the input value is available for all event functions by using the variable [input]. For example, you can call the function sendtext to set the input value of a form field:
sendtext([input])
Or, the input value could be set using JQuery:
exec($(element).val('[input]'))
If you set a form field value using any of these event functions, you may want to uncheck the form field box Set Value, so that the form field value cannot be set automatically, but rather only by an event function.
Query Selectors & Google's Geolocation Search
JavaScript query selectors can be used to fire events on other web elements than the action web element. Query selectors must be specified before the event name and must be enclosed in double quotes.
Example:
"#search".focus
The above event configuration fires the focus event on a web element with the ID search.
One important application for query selectors are websites using Google's Geolocation Search. The following event configuration is required to select a drop-down item in Google's Geolocation Search plugin.
"#search".focus
mouseover
"#search".blur
where search is the ID of the input field used by Google's Geolocation Search. This input field is custom defined, so you may need to use different query selectors for different websites. For example, if the Geolocation Search uses an input field with name city, the event configuration would look like this:
"input[name='city']".focus
mouseover
"input[name='city']".blur
Wait
Typically, you have no concern about the sequence of complex activities during the loading of a web page, since you simply wait for the content that you want to see. The most critical content on a web page will likely load far in advance of the time that you actually get around to view a specific part of the web page. Usually, all features function correctly as you fill in web forms or click links.
However, it's very different from web-scraping agents, since these agents are very fast. An agent will attempt to process a web page as quickly as possible and continue onto the next page. A web-scraping agent is so fast that it could easily start processing a page before all of the essential content loads. So, it's important that you configure an action command to wait for all important browser activities to complete and all the content loads before web page processing begins.
When an action command executes, it waits for certain activities to complete in the web browser. For example, if a command executes a click on a link, it may wait for a page to load or an AJAX call to complete. Some actions may result in a very complex set of activities. An action may load a new page that then uses AJAX to load additional dynamic content onto the page.
Discovering Activities
Action commands automatically discover web browser activities. After a command fires the action events, it will monitor all activity in the web browser and wait for critical activities to complete. Once no new activities have started for a little while, it will consider the action to be complete.
You can specify which activities an action command should wait for. The command can wait for activities in the main web page and in sub-pages that are loaded in web frames.
Page load activities can be optional or required. An error will be reported if a page load activity is required, but no page load occurs. If Wait for page load is set to None, the command will not wait for any page load to occur, which is slightly faster than setting Wait for page load to Optional.
An AJAX activity occurs when a web page loads content from the web server asynchronously. A Script activity occurs when a JavaScript file is loaded by the web page asynchronously. AJAX and Script activities are always optional, which means no error will be reported if a command is configured to wait for AJAX, but no AJAX activities occur.
Complex Website Activities
Some websites have very complex activities. For example, many travel websites that provide hotel and flight search functionality will load a waiting page and after a while load the actual search result. An action command will often complete the action after the waiting page is loaded since it doesn't know that more content will be loaded later. If the website redirects from the waiting page to the search result page, then the Wait option Delayed redirect can often be used successfully, but sometimes websites use other techniques and it can be very difficult for the action command to tell when an action has completed.
Sometimes it's possible to determine that a website action has completed when a specific URL has been loaded. This URL could be from a full page load, a frame page load, or an asynchronous AJAX call. A Wait for Content sub-command can be used to wait for a URL that matches a Regular Expression.
Sometimes the only reliable way to determine when an action has completed is to wait for certain web content to appear on the web page. A Wait for Content sub-command can be used to wait for web content.
Wait Timeouts
Action commands will wait for browser activities for a certain period of time before the wait times out, and the command either considers the action completed or reports the timeout as a page load error. The default timeout values are usually appropriate, but there will sometimes be situations where some timeout values should be modified. For example, timeout values may need to be increased for a very slow website in order for the agent to work properly, or timeout values could be decreased for a very fast website in order to increase agent performance.
Browser Activity Screen
This feature shows all browser activities that occur after the current action executes. You can use this information to determine potential issues with the configuration of the action. Use the Activity button on the Sequentum Enterprise status bar to open the Browser Activity screen, as shown in the figure below:
Critical activities have dark coloring and other activities have light coloring. A blue row appears in the sequence at the point where the command recognizes completion of the action. Activities that occur after the action completes may not necessarily indicate a problem. If the agent does not work as you expect, then you may need to reconfigure your action in such a way that it waits for some or all of those activities.
Methods to get the Default value of SE Properties
The following list includes the Default value of Sequentum Enterprise Agent Properties using the GetDefault() method with the help of the below library:
using Sequentum.ContentGrabber.Commands
HttpVersion.GetDefault() | This will return Version20. Example: args.Agent.HttpVersion.GetDefault().ToString(); |
UserAgent.GetDefault() | This will return the User Agent of the current Chromium version Example: args.Agent.UserAgent.GetDefault(); |
ParseNoscript.GetDefault() | This will return Always. Example: args.Agent.ParseNoscript.GetDefault().ToString(); |
ClearStorageMethod.GetDefault() | This will returnAllCookiesAndStorage. Example: args.Agent.ClearStorageMethod.GetDefault().ToString(); |
LoadBrowserImages.GetDefault() | This will return Always. Example: args.Agent.LoadBrowserImages.GetDefault().ToString(); |
LoadStyleSheets.GetDefault() | This will return Always. Example: args.Agent.LoadStyleSheets.GetDefault().ToString(); |
Command Properties
Action
URL Action: The action configuration for the agent command.
Activities: Specifies how this action should wait for the browser activities to complete.
Wait for Content: The default value is set to Optional. Waits for web selections and URLs specified by this property value required, optional, no wait.
Wait for External Sub-Page AJAX: The default value is set to False. Specifies if the command should wait for one or more AJAX requests to complete on sub-pages from different domains than the main page.
Wait for External Sub-Page Load: The default value is set to No Wait or Parse. Specifies if the command should wait for one or more page loads to occur that originate from different domains than the main page.
Wait for External Sub-Page Scripts: The default value is set to False. Specifies if the command should wait for one or more async scripts loads to complete on subpages from different domains than the main page.
Wait for Internal Sub-Page AJAX: The default value is set to True. Specifies if the command should wait for one or more AJAX requests to complete on sub-pages from the same domain as the main page.
Wait for Internal Sub-Page Load: The default value is set to Optional. Specifies if the command should wait for one or more page loads to occur in browser frames. Will only wait for pages from the same domain as the main page.
Wait for Internal Sub-Page Scripts: The default value is set to False. Specifies if the command should wait for one or more async scripts loads to complete on subpages from the same domain as the main page.
Wait for Main Page AJAX: The default value is set to True. Specifies if the command should wait for one or more AJAX requests to complete on the main page in the browser.
Wait for Main Page Load: The default value is set to Required. Specifies if the command should wait for one or more full page loads to occur in the browser.
Wait for Main Page Redirect: The default value is set to False. Specifies if the command should wait for the main page to redirect to another page.
Wait for Main Page Scripts: The default value is set to False. Specifies if the command should wait for one or more async scripts loads to complete on the main page in the browser.
Wait URL Regex: The action will not complete before a URL matching the specified Regular Expression has been loaded. This option is turned on by setting the option "Wait for URL" to a value other than None.
Wait XPaths: Waits for any of the specified XPaths to exist on the main page. Multiple Xpaths should be specified on separate lines. This option is turned on by setting the option "Wait for XPaths" to a value other than None.
Add Force Refresh Header: Adds an "If-Modified-Since" header to the web request to make sure the web page is not retrieved from cache. The default value is set to false.
Block Known Ad Servers: The web page will not load content from known ad servers, such as "ad.doubleclick.ne"”. This speeds up processing slightly.
Block Popup: Default value is set to False. This property set to True allows the user to blocks any popup opened by the action.
Browser Mode: This property specifies the different types of browser that can be used to run an agent.
Default- The new web browser will be the same type as the parent browser.
Dynamic Browser- The browser functions as a standard web browser, and it downloads images and executes JavaScript.
HTML Parser- The HTML Parser doesn’t execute JavaScript and does not load frames, so it is faster and more reliable than a Dynamic Browser. However, the parser doesn’t work on websites that rely on JavaScript, and the parser may also be unable to some web forms (even when they don’t rely on JavaScript).
JSON Parser- This property does not start a new browser. Instead, it parses JSON content returned by a web server and lets you easily extract content elements from the JSON content.
XML Parser- The property does not start a new browser. Instead, it parses XML content returned by a web server and lets you easily extract content elements from the XML content.
Capture Ajax Responses(Regex): A regular expression matching the URLs for which Ajax Responses should be captured and made available for Scripting. Multiple regular expressions separated by line breaks can be specified.
Capture Requests(Regex): A regular expression matching URLs which requests should be captured and made available for scripting. Multiple regular expressions separated by line breaks can be specified.
Clear Storage: This property allows users to clear the website’s cookies and local storage. The Default value indicates that no storage will be cleared. This property when set works on the commands such as Navigate_URL, Navigate_Link and not on the overall agent.
Default: A value of default will delete all cookies and local storage if using the in-memory cache, but only site cookies and local storage are using a persistent cache.
None: Indicates that will not delete cookies and storage.
All Cookies & Storage: It will delete all cookies and local storage if using the in-memory cache, but only site cookies and local storage are using a persistent cache.
Site Cookies & Storage: It will delete only website cookies and local storage if using the in-memory cache, but only site cookies and local storage is using a persistent cache.
All Cookies: It will delete all cookies only.
Site Cookies: It will delete website cookies only.
Site Session Cookies: It will delete site session cookies only.
Close Browser After Use: The default value is set to False. This property set to True allows the user to close any browser after use. This option is only applicable if the action opens a new browser.
Command Returning Max Scrolls: Limit the number of scrolls to a number captured by the specified command. This option can be useful when a page can scroll indefinitely.
Custom Request Headers: Custom headers are sent with all requests made by a browser. Headers must be sent on each new browser.
Detect Encoding: This property specifies how a new parser should detect the encoding of the content received from a web server. If the option is set to Default, this action uses the same detection method as the parent parser. If there’s no parent action, The default detection method is “Content & Server”. You must reopen the browser window or the agent for this change to take effect.
Discover Action: This property set to True configures action properties automatically when the command is first executed. The default value is False.
Editor Action: Specifies the web element or URL to use when performing the action in the design browser. The default value is “Default” which is indicated by “Use Specific URL” value False. To use the Specific URL, we set the “Use Specific URL” value True which allows users to use a specific URL.
URL: Specifies the URL to load.
Use Specific URL: Specifies whether to load a direct URL. This value is set to False.
Error Handling: This property specifies how the agent should react when an error occurs while executing the command action. The default reaction is to exit the command. Use the option “No Error Handling” if you want the agent to continue executing sub-commands after an error. You can handle the error in sub-commands by using the script parameter “IsParentActionError”. This property specifies the different ways of Error Handling.
Exit Command-The agent command will exit the action command and continue executing the next command. The agent will skip all the sub-commands of the action command.
Retry With No Error Handling -The agent command will retry the specified number of times, but then continue with no error handling.
Restart and Resume Agent- The agent command will restart and resume where it left off. This option is useful if an error puts the website into a state where the agent cannot continue.
Restart Agent and Retry Command- The agent command will retry the action command a specified number of times, and if the action command does not succeed, then it will Restart the Agent.
Stop Agent- The agent will stop.
Retry Command- The agent command will retry the action command a specified number of times, and if the action command does not succeed, it will skip all sub-commands of the action command and continue executing the next command. Set the property Retry Count to specify the number of retries. If Retry Count is set to zero, the agent will keep retrying the command indefinitely.
Error Retry Clear On Success: This property clears the counter if the action succeeds. The default value is set to True.
Error Retry Count: This property allows users to specify the number of times the agent should retry the command when an error occurs while executing the command action. This property is only applicable if “Error Handling ” is set to “Retry Command”. The default value is set to 5, which indicates that the agent will retry the command 5 times.
Error Retry Delay: This property allows users to specify the number of milliseconds the agent will delay a retry. This property is only applicable if “Error Handling ” is set to “Retry Command”. The default value is set to 5000, which indicates that the agent will delay a retry 5000 milliseconds.
Error Retry Proxy Rotation: The default value is set to “Keep and Rotate”. This property allows users to specify if the proxy should rotate before retrying the action and if the current proxy should be removed. This property is only applicable if “Error Handling ” is set to “Retry Command”.
None- Indicates that will not Rotate proxy before or after retrying the action.
Remove and Rotate-Indicates that current proxy will be removed and Rotate proxy before retrying the action.
Remove and Rotate After Retry-Indicates that current proxy will be removed and Rotate proxy after retrying the action.
Keep and Rotate- Indicates that current proxy will remain same and Rotate proxy before retrying the action.
Fixed Encoding: This property specifies the encoding when “Detect Encoding” is set to “Fixed”. You must reopen the browser window or the agent for this change to take effect.
Ignore Error Codes: Some websites return incorrect status codes, so this property allows users to ignore status codes by default unless the web server is not returning any content.
JSON Transformation: Regular expression used to transform JSON content loaded with a static parser.
Limit Number Of Scrolls: This property set to true allows users to limit the number of scrolls to a specified number. This property can be useful when a page can scroll indefinitely. The default value is set to false.
Max AJAX Calls: Stops waiting for AJAX after the specified number of AJAX Calls. The default value is set to 10.
Maximum Number Of Scrolls: This property allows users to limit the number of scrolls to a specified number. This property can be useful when a page can scroll indefinitely. The default value is set to “50”. This property is only applicable if “Limit Number Of Scrolls ” is set to “True”.
Never Open New Browser: This property set to True indicates never configure an action to open a web page in a new web browser when discovering an action. Some webpage may not function correctly when opening a link in a new browser. The default value is set to False.
No Parse: The web pages are not parsed if existing parsed pages exist. The default value is set to “False”.
Page Not Found Handling: Specifies the action to take if a web page was not found. The default is to do nothing and let normal error handling deal with the error.
Continue- If a web page not found, it continues executes to the agent.
Exit- If a web page not found, it will exit from the agent.
Redirect First Request : This property allows users to redirect the first request to a new browser window when Target Browser is set to “New”, even if the first request is coming from a frame within the current browser window. If this property is set to “False”, requests from frames within the current browser window will not be redirected.
Rotate Proxies: The property set to “True” allows users to rotate the proxy before executing the action. The default value is set to False.
Scroll Steps: The number of pixels the page will be scrolled in each step. The default value of “o” will scroll the page all the way to the bottom and all the way to the right in each step.
Scroll Until End of Page: This property set to True allows users to scroll to the end of the web page after an action. Scrolls repeatedly until unable to scroll any further, and wait for AJAX calls to complete if scrolling trigger AJAX calls. The default value is set to “False”.
Separate Parser Proxies: Creates a separate parser proxy loop if the action opens in a parser. The default value is set to “False”.
Target Browser: Specifies the web browser where a new web page should be loaded. These are the different options which can be used by a user:
New: Default value is set to New which specifies to load a web page in a new browser window.
Current: This specifies to load a web page in the same browser window.
Parent: This specifies to load a web page in the parent browser window.
Popup: This specifies to load a web page in a popup window.
Timeouts: Specifies timeout values for the action. Timeouts specified in activities override these values.
Ajax Completed: The default number of milliseconds to wait for an AJAX call to complete.
Ajax Content Render Delay: The default number of milliseconds to wait for ajax loaded content to render on a web page.
Ajax Content Render Delay After Scroll: The default number of milliseconds to wait for ajax loaded content to render on a web page after triggering a scroll. The command will stop waiting as soon as it can scroll the page further down. This property is different from “Ajax Content Render Delay” which always waits a fixed amount of time. A fixed timeout is slower but is required on some websites.
Asynchronous Completed: The default number of milliseconds to wait for an asynchronous action to complete.
Discover First Activity: The default number of milliseconds to wait for the first activity when discovering new activities.
Discover First URL Activity: The default number of milliseconds to wait for the first URL to start loading
Discover Next Activity: The default number of milliseconds to wait for the next activity when discovering new activities.
File Download Completed: The default number of milliseconds to wait for a file download to complete.
File Download Started: The default number of milliseconds to wait for a previous file download to complete to start downloading the next file.
Frame Completed: The default number of milliseconds to wait for frame content to complete loading. This timeout applies to internal frames, and only to external frames if the property WaitForExternalFrames is set to True.
JavaScript Parser Timeout: The number of milliseconds to wait for JavaScript DOM operations to complete
Main Page Redirect: The number of milliseconds to wait for the main page to redirect to another page.
Page Completed: The default number of milliseconds to wait for a page load to complete.
Wait For Content Timeout: The default number of milliseconds to wait for web content to appear on a web page, or a URL to load that matches a specified Regex.
Wait Times: The default timeout values are multiplied by this value. This can be a quick way to test if issues with action are caused by timeout values being too short. Default timeouts are used when discovering activities, and when scrolling a page.
Agent
Activity Timeout Minutes: An agent Restart and Continue if no activity has been recorded for the specified number of minutes. Set this value 0 to allow an unlimited number of minutes. The default value is set to 10 minutes.
AgentID: Universally unique agent ID.
Agent Version: It shows the version of an agent. Any kind of modification in the agent is reflected through different agent versions.
Create Website Images: Default settings used when saving a website version.
Create Date Stamped Folder- Saves the files in a folder that is named with the current date.
Location- The directory where website pages should be saved.
Save HTML- Saves web pages as HTML.
Save PDF- Saves web pages as PDF.
Save Screenshot-Saves web pages as Screenshot.
Use Default Location- Saves web pages as images on the default directory path.
Custom Identifier: This property indicates a unique ID for each project which is referenced for MDS entries for the agent.
Custom Language: This property indicates the language used when Language is set to Custom.
Data Count Timeout Minutes: This property specifies the number of minutes allowed for an agent’s data count to increase before the run is considered failed. Set this value to 0 (zero) to allow an unlimited number of minutes. The default value is set to 30 minutes.
Directory: This property indicates the Default Directory for the agent.
Fail and Exit on Error Count: Fails and exits the agent when a specified number of errors have occurred. A value of zero means the agent will not fail and exit on error count.
Load Start URLs: This property allows the user to have the flexibility to run the agent without having to worry for loading the Start URL and can start with loading inputs rather than loading the URL first. The default value is set to True which makes it mandatory for the user to add a start URL and if it is set to False then no start URLs will be loaded by the Agent command. The default value is set to True.
Max Run Time: The default value is set as blank which indicates that the agent can run an unlimited number of hours or days. The specified value must be in the format HH:mm: ss additionally it can also be set in the below format if you want the agent to run for more than 23 hrs: 0.00:00:00 – Day.HH:MM: SS. For example, if you want to run your agent for 23 hrs. then it should be set as 23:00:00 and if you want it to run for 2 days then it should be set as 2.0:00:00.
Screenshot Logging: Default settings used for screenshot logging.
Date Stamp Folder - Saves log in a date-stamped sub-folder when using default log paths.
Log Path -The directory where log screenshots should be saved.
Max Screenshot Height - The screenshot will be resized.
Max Screenshot Width- The screenshot will be resized if it’s wider than this value.
Resize screenshot - Resize screenshot image file.
Screenshot Folder - The Folder name where log screenshots should be saved.
Template Path - Page template.
Use Default Paths - Use default paths. Default value is set to True.
Use Default Template - Use default template. Default value is set to True.
Anonymization
Clear Storage: This property specifies to delete cookies and local storage as specified by the “Clear Storage Method”.
Default: This property is used to delete cookies and local storage as specified by the “Clear Storage Method” when an agent starts.
On Agent Start: This property is used to delete cookies and local storage as specified by the “Clear Storage Method” when an agent starts.
On Every Page Load: This property is used to delete cookies and local storage as specified by the “Clear Storage Method” on every page load.
On Proxy Rotation: This property is used to delete cookies and local storage as specified by the “Clear Storage Method” at the time of proxy rotation.
Clear Storage Method: This property specifies the type of storage to delete when deleting storage. A value of default will delete all cookies, but it will not delete local storage if using the in-memory cache, but only site cookies and local storage are using a persistent cache. Also, note that ‘Clear Storage’ as well as ‘Clear Storage Method’ properties when sets are interlinked and work in conjunction. Also, these properties when set, work on the overall agent.
Default: A value of default will delete all cookies but will not delete local storage if using the in-memory cache, but only site cookies and local storage are using a persistent cache.
None: Indicates that will not delete cookies and storage.
All Cookies & Storage: It will delete all cookies and local storage if using the in-memory cache, but only site cookies and local storage are using a persistent cache.
Site Cookies & Storage: It will delete only website cookies and local storage if using the in-memory cache, but only site cookies and local storage are using a persistent cache.
All Cookies: It will delete all cookies only.
Site Cookies: It will delete website cookies only.
Site Session Cookies: It will delete site session cookies only.
Geolocation: The Geolocation API gives websites access to a high-level interface that allows them to query physical location information such as latitude and longitude, thereby breaking the user's anonymity. By default, it does not allow the websites to access your location and ensures that websites will not access the location without explicit permission. This setting is applicable only to dynamic web browsers.
Allow Geolocation Access: Allows/Disallows the browser to access the geolocation. This setting is applicable only to dynamic web browsers. By default, the property is set to Default which means that the websites will not be allowed to access the geolocation. And the second option Allow can be used to allow the websites to access the geolocation.
Geo Accuracy: The accuracy value to be sent along with the Geolocation request. For example, You can set it to 1 to get an accuracy of about 1 kilometer or 0.62 miles.
Geo Latitude: The latitude value to be sent along with the Geolocation request.
Geo Longitude: Longitude value to be sent along with the Geolocation request.
Keep Connection Alive: Keeps a connection to the target website alive when using HTML Parser. This option can be set to False to allow a proxy rotation service to properly rotate when using an HTTPS connection. It is used for maintaining the connection.
Profile Rotation: An Agent can be configured to use random web browser profiles while extracting data. The default value is set to No Rotation. This will reflect the value which is specified in the “Rotate Web Browser Profile” field.
Fake Audio Context: This property returns fake audio fingerprint context with random noise. This property is set to False by default.
Fake Browsing History - Returns Random Browser History Length and Overrides the Browser History Length. By Default it is selected as "True".
Fake Canvas String: This property returns fake, but valid canvas strings, even when “Allow Canvas Reading ” is turned off.Canvas reading in HTML5 is used by some websites to fingerprint a browser. This new property in CG Enterprise always generates a valid random (spoofed) canvas string to hide your fingerprint signature and provide uniqueness to your canvas string on the target websites . This property is set to True by default.
Fake Connection Info: This property returns fake information about the internet connection.This property is set to False by default.
Fake CPU: This property returns fake information about CPU and Memory. This property is set to False by default.
Fake Fonts: This property returns fake information installed fonts.This property is set to False by default.
Fake Language: This property returns fake information about the language. This property is set to False by default.
Fake Media Codecs Support - Emulated Media Codecs Support in Modern Web Browsers, where it mimics or pretends to support certain media codecs when queried by web pages or scripts. By Default it is selected as "True"
Fake Media Devices: This property returns fake information about media devices. This property is set to False by default.
Fake Performance Timers: This property returns fake timings from performance timers. This property is set to False by default.
Fake Plugins: This property returns fake information installed plugins. This property is set to False by default.
Fake Screen: This property returns fake information about screen size and color. It’s important to combine this feature with the Agent option “Randomize Browser Size”. This property is set to False by default.
Mask 'ToString' Prototype: The Mask 'ToString' Prototype generates a concise, human-readable string of an agent's masking properties, showing the state of each feature used to manipulate or emulate browser fingerprinting for anonymity. It helps with debugging, logging, and verifying configurations. The default value is set to True.
Mock Chrome Web Share - Mocks Chrome's Web Share API in which our SE software imitates the functionalities provided by Chrome's Web Share API. Specifically, it could mock two main functions:
navigator.share: This is a method in the Web Share API that allows web pages to trigger the native sharing capabilities of the user's device.
navigator.canShare: This method checks if data can be shared using the Web Share API.
By Default it is selected as "True".
Mock Chrome.App - Mocks Chrome.app API in which our SE software replicates the functions and responses of the chrome.app API. This API is typically used in Chrome extensions and apps to interact with browser-level functionalities and settings. By Default it is selected as "False".
Mock Chrome.Runtime - Mocks Chrome.Runtime API in which our SE software attempts to mimic the functions and responses of the chrome.runtime API. This API is typically used in Chrome extensions and apps to interact with the Chrome browser environment, manage extension lifecycle, and communicate between components. By Default it is selected as "False".
Rotate Web Browser Profile: This property emulates different web browser profiles to stay anonymous when websites use web browser fingerprinting. This setting is only applicable to dynamic web browsers. This setting may cause some websites to work incorrectly. The default value is set to “No Rotation”.
No Rotation: Indicates that the Web browser profile will not change.
On Every Page Load: Indicates that Web browser profile will Rotate on every page load.
New On Proxy Rotation: Indicates that Web browser profile will rotate only when the proxy gets rotated.
Web Driver: Sets the navigator's web driver property to true or false. This property is set to False by default.
Proxy Configuration: This property specifies how the proxy is configured for the agent. The default value is set to “Application” which specifies that the agent uses the proxies which is added under “Application Settings” →Proxies. This will reflect the value which is specified in “Proxy Type”.
Is Disallow Config: The default value is set to False. Setting this property to True disallows configuration files to overwrite the proxy properties.
Proxy Pools: This property specifies the proxy pools used for the agent. The default value is set to “None”. This property will display the proxy pool settings value which is specified in “Agent Settings→Proxy Pools→Proxy Pool Settings”.
Cycle Pools - The default value is set to False . Setting this property to True rotates backs to the first proxy pool when there are no more available pools.
Proxy Pool: This property specifies the number of proxy pools used in the agent. The default value is “0 (zero) Proxy pool”. This will display the count of proxy pools which is specified in “Agent Settings→Proxy Pools→Proxy Pool Settings”.
Rotate Error Count - The default value is 10. This property rotates the proxy pool on the number of page errors if "Rotate Proxy Pool on Errors" is enabled. This property only looks at the last number of "pages specified in "Error set Size" For example, if "Error Count" is set to 10 and "Error Set Size" set to 20, then the proxy pool will rotate if the last 20 page loads resulted in 10 or more page errors.
Rotate Error Set Size -The default value is 20. The number of pages to evaluate when deciding if the proxy pool should rotate. For Example,if "Error Count" is set to 10 and "Error Set Size" set to 20, then the proxy pool will rotate if the last 20 page loads resulted in 10 or more page errors.
Rotate Pool on Errors - This property rotates the proxy pool automatically after a specified number of page errors.
Proxy transformation script: A script used to change proxy the properties at run time. The script can also be used to do special validation at run time.
Example:
args.Proxy.GetTransformedProxy("149.36.10.0:60000", "uname", "pswd")
This will allow use of certain proxy provider options, such as sticky proxy sessions where the session ID is specified as part of the proxy username.This will also allow custom proxy verification before the proxy is being used. If a proxy is not valid, the script should return a value indicating rotation of proxies, and if the proxy should be removed from the pool.Proxy transformation takes place just before any proxy is used. It's agent wide by nature. Below are the some example use cases when we can use Proxy Transformation:1. It can be used when a proxy may need custom verification at runtime, to avoid extracting bad data for example.
2. A proxy may need custom username parameters at runtime to specify session.
3. Proxies may need to be disabled for specific URLs.Proxy type: The default value is set to "Application" , which specifies that the agent uses the proxy pools which is added under "Application Settings" →Proxies.
Random Delays: Default value is No Random Delays which is indicated by “Use Random Delays” value False. To enable the Random Delays, we set the “Use Random Delays” value True which allows users to insert random delays every time a page is loaded.
Maximum Delays: - Maximum number of milliseconds to wait after a page has loaded. The default value is 5000 milliseconds.
Minimum Delays: - Minimum number of milliseconds to wait after a page has loaded. The default value is 1000 milliseconds.
Use Random Delays: - Insert random delays every time a page is loaded. The default value is set to False indicates that there is no Random delay. To use this, we set this property True.
Randomize Browser Size: Default value is set to False. This property set to True indicates that new random size is set for the web browsers every time proxies rotate.
Timezone Configuration : An agent can be configured to use Timezone Id when loading a web page. The default value is set to System Default. This will reflect the value which is specified in the “Timezone” field.
Timezone : The "Time Zone" feature allows users to set a different time zone from the timezone set on the Windows Machine. It accepts the timezone ID in the IANA format (for example, "America/Los_Angeles" without the quotes). This property provides two options i.e. System Default and Override With Pool. If the user chooses Override With Pool, the agent gains access to the pools and can override as per the settings being set by the user accordingly, although if the user chooses System Default then it will take the Timezone of the default windows system.
The default value is set to System Default.
Timezone Pool : This property is used to configure the Timezone pool based on the requirements. Click the triple dots‘...’ button at the right end of the property to open the Timezone pools dialog box.
Search Timezones : This option enables the agent to search for time zones based on specific requirements.Filter option consists of two types i.e. Text filter through which we can simply write the name of the particular Timezone ID or group of Timezone IDs and select or the Value filter through which we can provide certain conditions based on which we can pick our Timezone IDs and use them in our pool.
Clear All : By selecting this option, all previously attached timezone IDs are cleared, resulting in an empty pool.
Rotate Timezone : Emulates different Timezone IDs. This setting can cause some websites to work incorrectly. The default value is set to “No Rotation”.
No Rotation : Indicates that the Timezone will not change.
On Every Page Load : Indicates that the Timezone will Rotate on every page load.
Rotate With Proxy Address : Indicates that Timezone will rotate with the proxy address.
New On Proxy Rotation : Indicates that the Timezone will rotate with proxy rotation.
Shuffle Time Zones : This option shuffles the individual Timezone IDs and and will provide shuffled Timezone IDs to the agent at runtime apart from picking Timezone IDs into sequence.
User Agent Configuration : Consolidates User-Agent-related settings such as User-Agent override and rotation configuration into a single configuration category. Integrated a built-in database of modern and legitimate user-agents for end-users to query. It empowers end-users to customize and organize user-agents based on browser type and version. This configuration includes:
User Agent Override: Configures User-Agent override, allowing options like Chromium Default or Override With Pools.
User Agent Pools: Configures the User-Agents pool, where end-users can customize their list.
User-Agent pools offer options like rotating user agents on every page load, rotating with proxy addresses, and shuffling the user agent list at runtime. The pools can include a built-in database and a custom list of user-agents added by end-users.
User Agent : It is used to configure the User Agent as Override with Pools or Chromium Default. To use the default Chromium User-Agent , select the "Chromium Default" from the dropdown and to use the User Agent Pools, choose "Override with Pool".
User Agent Pools : This property is used to configure the User-Agents pool based on the requirements. Click the triple dots‘...’ button at the right end of the property to open the UA pools dialog box.
Add User-Agent Pool (List): The default pool is a built-in database pool, which is added when creating a new agent. At runtime, user-agents from all enabled pools are combined into a single list, and, depending on the configuration, shuffles the list.
Built-in Database: SE3 provides a built-in database for user-agent lists with automatic population of Client Hints data.does not have the utility to automatically parse the client hints data from the user-agent string, as these information cannot always be found in the user-agent string, so the user has to manually populate these fields when using the Custom List pool.
The “Add As Custom List Pool” button in the Built-in database dialog allows the user to further customize the filtered user-agents.
Columns can be filtered by clicking on the filter icon at the right of the target column header.
Convert to Custom List Pool: With this option, users can generate a custom list pool from the filtered pool. Additionally, within the filtered pool, there are two options i.e. 'Import' and 'Export,' allowing users to import or export CSV/Excel files containing user agents.
Custom List: This option allows users to create custom User-Agent pools, enabling the customization of user agents to fit specific needs. It empowers users to customize and organize user-agents based on browser type and version according to their preferences.
In the User Agent Pool dialog box, you'll find a 'Rotate User Agent' option with following four options available in the dropdown:
Rotate User Agent : Emulates different user agents. This setting can cause some websites to work incorrectly. The default value is set to “No Rotation”.
No Rotation : User agent remains constant.
On Every Page Load : User agent rotates with each page load.
Rotate With Proxy Address : User agent rotates with the proxy address.
New On Proxy Rotation : User agent rotates specifically with proxy rotation.
Shuffle User-Agents : This feature randomizes individual User-Agents, offering the agent a set of randomly selected User-Agents during runtime, alongside the sequential selection. By default the option is checked.
Basic Authentication
Some websites use basic Windows authentication, and they will display a Windows login box. Sequentum Enterprise gives you the ability to set the Username and Password for basic Windows authentication by editing the Agent Command and then setting the Username and Password in the Basic Windows Authentication of the properties tab. After setting the basic Windows authentication properties, you must reload your agent for the properties to take effect. Basic Windows authentication does not work in HTML Parsers, JSON Parsers, and XML Parsers.
Username: A username to use when a website uses basic Windows authentication.
Password: A password to use when a website uses basic Windows authentication.
Command
Command Description: A custom description for the command. The default value is Empty.
Command Transformation Script: A script used to change command properties at runtime. The default value is disabled.
Disabled: This property set to True allows the user to disable the command. A disabled command will be ignored. The default value is set to False.
ID: This property indicates the internal ID of the command.
Increase Data Count: This property indicates the data count every time this command is processed. The default value is set to False. Set it to TRUE if you want to get the count of the number of times a specific command is executed to get the data. The data count value is increased during data extraction, so it is used to measure agent progress, and basis this increased data count, the agent decides the success criteria.
Name: This property specifies the name of the command.
Notify On Critical Error: A notification email is sent at the end of an agent run if the command encounters a critical error, and the agent has been configured to notify on critical errors. Critical errors include page load errors and missing required web selections. The default value is set to False.
Container
Always Execute: Set this property value as True, if you want the container is always executed if encountered on retry or continue. List Commands are only executed if they process a single input. Always Execute can be used to always execute container commands when retrying, even when they have already been completed. The default Value is set to False.
Command Link: Links to another container command where processing will continue. The targeted container command will be executed, so it’s normally best to link to a group command that does nothing, so it’s clear what happens after the link.
Dependent Command: The action of the dependent command will come into effect only when the agent has a supporting parent container command, in which case, it will be executed before the parent container command is processed.
Repeat While Selection is Valid: Set this property to TRUE if you want to repeatedly process the command while the command selection is valid. The default value is set to False.
Content Cache
Retention Days: This property specifies the number of days to keep the cache file.
Write Cache: The default value is false. When this property is set to true, it allows us to store all downloaded content in a cache, which can be used to rerun an agent without having to access the target website. By using this property, we can do minor changes in XPath, Regex, Transformation script by avoiding extra hitting and less execution time.
Data
Database Connections: A list of database connections stored in the agent.
Export Target: The Default value is set to Excel. This property allows us to set the export target format in which we want to export the data e.g. CSV, JSON, Parquet Export, Script Export Email, Excel Export, etc. The value under the ‘Export Target’ text area get set according to what specific export target option gets enabled
Details of most commonly used export targets are as follows:
JSON- The data exports to a single JSON File.
CSV- The data exports to one or more CSV files. The Default character encoding is UTF-8, but you can specify another type of encoding.
Parquet- The data exports to one or more Parquet files.
AWS S3 Bucket Delivery: The Default value is set to Disabled. This property allows us to deliver the data on S3 Bucket. To deliver the data on S3 Bucket, we need to set this value Enabled as True and specify the s3 Bucket Name, Credentials File Path, and Folder Name.
Bucket Name: Specify the bucket name. e.g. “ sequentum-test”