Understanding Remote Agents
Welcome to our comprehensive guide on remote agents. In this article, you will learn what remote agents are, how they function, and the various features they offer. Whether you are a beginner or an experienced user, this guide will provide valuable insights to help you effectively manage and utilize remote agents in your system.
Agent
In this article
Prerequisites
If you don't have an Avalanchio account, create an account before you begin. You must have required permission for agent feature to continue. 1. Manage Agent 2. View Agent
Note
To install or update, run the following commands. You can create a bash file since you need to run these commands once in a while to update the agent and set up a crontab task to automatically update the agent.
What is a remote agent?
A "remote agent" typically refers to a type of software or system component that operates from a location separate from the user or the primary system it interacts with.
How remote agent works?
The remote agent is an application that is designed to forward data from an endpoint machine to the Avalanchio platform over HTTPS protocol. Remote Agent is designed to be installed and updated remotely without direct access to the machine. You can install many remote agents. Communication between Remote Agent and Avalanchio Servers is initiated by Remote Agent. Remote Agent fetches required libraries and connection properties from Avalanchio Servers over HTTPS. Here's an overview of how a remote agent works:
- Communication and Connectivity Internet Connection: The remote agent relies on a stable internet connection to communicate with servers, clients, or control systems. This connection allows it to send and receive data, commands, and responses. VPN or Secure Channels: For secure communication, remote agents often use Virtual Private Networks (VPNs) or other secure communication channels to protect data integrity and confidentiality.
- Software Components Agent Software: This is the core software that runs on the remote system. It can be a lightweight daemon, service, or application designed to perform specific tasks. Control Server: The central server or control center where the remote agent reports to. This server can send instructions, receive data, and manage multiple remote agents. APIs and Protocols: The agent communicates using various APIs (Application Programming Interfaces) and protocols (HTTP/HTTPS, MQTT, WebSocket, etc.) to interact with the control server and other systems.
- Task Execution Task Scheduling: The remote agent can schedule and execute tasks based on predefined intervals, triggers, or real-time instructions from the control server. Data Collection and Reporting: It can collect data from the local system (e.g., performance metrics, logs, status updates) and report it back to the control server for analysis and monitoring. Automation Scripts: The agent can run automation scripts or commands to perform maintenance tasks, updates, or any other required actions.
- Monitoring and Management Real-Time Monitoring: The remote agent can continuously monitor system health, performance, and other metrics, providing real-time updates to the control server. Alerts and Notifications: If the agent detects an anomaly, error, or any predefined condition, it can send alerts and notifications to administrators or automated systems. Remote Control: Administrators can send commands to the remote agent to perform actions such as restarting services, updating software, or changing configurations.
- Security and Compliance Authentication and Authorization: To ensure that only authorized users and systems can interact with the remote agent, it uses authentication (e.g., passwords, keys, certificates) and authorization mechanisms. Encryption: Data transmitted between the remote agent and control server is often encrypted to prevent interception and tampering.
Agent Flow
+----------------+
| |
| Data Sources |
| (File System, |
| AWS S3, etc.) |
+-------+--------+
|
v
+-------+--------+
| |
| Remote Agent |
| |
+-------+--------+
|
+---------------+----------------+
| | |
v v v
+--------+--------+ +---+---+ +-------+-------+
| Data Import | | Scripts | | Notebooks |
| Module | | Module | | Module |
+--------+--------+ +---+---+ +-------+-------+
| | |
-----------------------------------------
|
v
+--------+--------+
| Parsing Module |
| |
+--------+--------+
|
v
+--------+--------+
| Transmission |
| Module |
+--------+--------+
|
v
+--------+--------+
| Data |
| Destinations |
| (Directory, etc.)
+-----------------+
Remote Agent Features
Feature | Description |
---|---|
Live Data Ingestion | Remote agents help ingest data as a live data stream from file output of Syslog or any other similar application. |
Concurrent Load Tasks | A remote agent can perform multiple load tasks as a concurrent process, enhancing efficiency and speed. |
Lightweight Data Processing | The remote agent can perform lightweight parsing, filtering, and field extraction while sending data to the stream. Complex parsing is handled by the Parsing Framework. |
Secure API Authentication | The remote agent uses API authentication credentials. User account access is controlled by Permission Sets, ensuring secure and controlled access. |
Task Management | The remote agent or specific tasks can be enabled or disabled from the control UI, providing flexibility in task management. |
Session Heartbeat and Updates | Remote agents send a heartbeat to check session validity and receive any configuration updates from Avalanchio Servers, ensuring continuous operation. |
Automatic Configuration Reload | Remote agents automatically reload configuration changes for tasks, minimizing downtime and manual intervention. |
Secure Data Transmission | Remote agents send data using SSL/TLS encryption for secure transmission. Optionally, data can be sent directly to a Kafka topic if configured. |
File Offset Maintenance | While reading files, remote agents maintain a local offset to prevent duplicate data ingestion upon restart, ensuring data integrity. |
Processing Metrics | Remote agents keep counters for the number of records and bytes processed, allowing for detailed monitoring and reporting. |
Log Upload for Troubleshooting | Remote agents upload application logs to Avalanchio Servers, enabling administrators to troubleshoot potential issues without accessing the remote agent machine directly. |
Data Fetching from Various Sources | Remote agents can fetch data from various sources such as Active Directory or RDBMS databases (e.g., SQL Server, MySQL) using connectors, enhancing data integration. |
How to add a remote agent?
-
In the Avalanchio account, on the sidebar menu, select setup>agent.
-
In the Remote agent page, go to + Create Remote Agent
-
Fill the name for new agent
- Click on save.
What are the valid names for an agent?
There is no such restriction in naming.You should always give a meaningful name for agent identification.
What is an active agent signifies?
If an agent is in active mode if user creates task and activate the task the tasks will be executed.If agent is not active then the tasks will not get executed even if the tasks are in active mode
Who can activate/deactivate an agent?
Owner of the agent and user with administrator role can activate/deactivate an agent
What happens when an agent fails?
When an agent fails notification will be sent to the user via email at the given email address.
How to find out the time of the agent creation?
The created on fields displays the owner of the agent
How to find out when the agent got modified?
Last Modified On field display the last modified timestamp of the agent.
What is the heartbeat of an agent?
The heartbeat of an agent refers to a periodic signal or message sent by the agent to indicate that it is operational and functioning normally. Periodic Signal: The agent sends a heartbeat signal at 3 second of intervals.
Indicator of Health: The presence of a heartbeat signal indicates that the agent is alive and running. It serves as an indicator of the agent's health and availability.
What is meant by last heartbeat of an agent?
Last heartbeat is a timestamp which indicates when the agent was last functioned.
How to remove an agent?
To remove an agent permanently user can delete that agent
Can an agent be deleted in active state?
No an agent can not be deleted in active state.
In this article
Remote Agent Task
An agent can do multiple tasks. These tasks can run in parallel. Execution or failure of one task does not impact the other. This could involve tasks such as data processing, file management, system monitoring, scripts execution conducted from a centralized location.
prerequisites
1.You must first have an account in Avalanchio. 2.You need to create ana agent.
What is remote agent task?
A "remote agent task" refers to a specific duty, assignment, or activity that is carried out by an agent remotely, meaning the agent is operating from a location separate from where the task is being performed or controlled.
For example, Task 1: one task can watch a file path and send any new files arriving at that path to Avalanchio topic.
Task 2: execute a script to collect data from the user machine.
Task 3: connect to database using credentials provided by the user and fetch data from it and forward the output to Avalanchio topic.
Task 4: Execute a search on Avalanchio and send the output of a query to an internal system.
Who can manage an agent task?
Owner of the agent and user with administrator role can manage agent tasks.
How to add a remote agent task?
- Select the agent you want to create the task for.
- Click on Add task
- Fill out the details
- Click on save
- Add Source Properties and Sink Properties
- save the task
Why a user must choose task type?
This field specifies the category of the task. It provides context about the nature of the task.
Available task types are
Task Types
Task Type | Description |
---|---|
Data Import | This task type is used to import data from a specified source. It typically involves defining the data source and data type. Data import tasks automate the process of loading data into the specified output location for further analysis, reporting, or processing. |
Execute Script | This task type allows users to execute custom scripts. Users can write and upload scripts in Python. Execute script tasks are commonly used for performing data transformations, calculations, or custom business logic as part of automated workflows or data processing pipelines. |
Execute Notebook | This task type enables users to execute Jupyter notebooks. Notebooks contain a combination of code, text, and visualizations, allowing users to analyze data, create reports, and share insights in an interactive environment. Execute notebook tasks provide a convenient way to incorporate data analysis and visualization workflows into automated processes or scheduled tasks within the system. |
What are the execution modes available?
Task Execution Modes
Execution Mode | Description |
---|---|
Continuous | In continuous execution mode, tasks are executed continuously or repeatedly without interruption. This mode is typically used for tasks that require real-time processing or monitoring, where data is constantly flowing, and actions need to be performed continuously as new data becomes available. |
Scheduled | In scheduled execution mode, tasks are executed at specific times or intervals according to a predefined schedule. Users can specify the frequency, timing, and recurrence pattern for task execution, allowing tasks to be automatically triggered at regular intervals (e.g., hourly, daily, weekly). |
On Demand | In on-demand execution mode, tasks are executed manually or on request by users. This mode allows users to trigger task execution as needed, providing flexibility and control over when tasks are performed. On-demand execution is often used for tasks that require user input or confirmation before execution. |
What is source and sink type?
User need first select the type of source Available source types are
Data Source Types
Data Source | Description |
---|---|
File System | Users can specify the path to the directory or file(s) containing the data to be imported. |
AWS Cloudwatch | AWS CloudWatch is a monitoring and observability service provided by Amazon Web Services (AWS). This source type allows users to extract data from CloudWatch metrics, logs, and events for analysis, monitoring, or reporting purposes. |
AWS S3 | Amazon Simple Storage Service (S3) is a scalable object storage service provided by AWS. This source type enables users to access and retrieve data stored in S3 buckets for processing, transformation, or integration with other systems. |
MySQL | MySQL is an open-source relational database management system (RDBMS). This source type allows users to connect to MySQL databases to extract, query, or analyze data stored in MySQL tables. |
PostgreSQL | PostgreSQL is an open-source object-relational database system. This source type enables users to connect to PostgreSQL databases to retrieve, manipulate, or analyze data stored in PostgreSQL tables. |
SQL Server | SQL Server is a relational database management system developed by Microsoft. This source type allows users to access SQL Server databases to extract, transform, or load data for various purposes. |
Oracle DB | Oracle Database is a multi-model database management system developed by Oracle Corporation. This source type enables users to connect to Oracle databases to extract, query, or analyze data stored in Oracle tables. |
Kafka | Kafka is a distributed event streaming platform capable of handling trillions of events a day. This source type allows users to stream data from Kafka topics for processing and analysis. |
Elasticsearch | Elasticsearch is a distributed, RESTful search and analytics engine. This source type enables users to query Elasticsearch indices and retrieve data for search, analysis, or visualization purposes. |
Windows Active Directory | Active Directory is a directory service developed by Microsoft for Windows domain networks. This source type allows users to access and retrieve data from Active Directory for user management, authentication, or identity-related tasks. |
Available sink type is - Table:
How to use a script in executing a task?
In the context of executing a task, a script serves as a set of instructions or code that defines the actions to be performed during task execution.
Data Manipulation: Scripts can manipulate data as part of the task execution process. Users can write code to extract, transform, filter, aggregate, or enrich data from various sources before loading it into the target destination or performing further processing.
source properties
property name | Description |
---|---|
PATH | If source type is file system user must peovide the path of the source file |
MAX_BATCH_SIZE | |
ARCHIVE | |
ARCHIVE_DIRECTORY | |
ENABLE_TAILING | |
TAILING_TIMEOUT | |
LINE_FILTER | |
DELETE_SOURCE_AFTER_READ | if user wants to delete source file after processing |
How to reset the properties
To set the default values user can use reset property option
sink properties
property name | Description |
---|---|
TOPIC | If sink type is topic user must provide topic name |
DIRECTORY | If user want the output to be saved in a file then the path must be mentioned in directory |
KEY_RANDOMIZE |
How to stop the task while in process
User can deactivate the task to stop in middle.
What happens when user stops the task in between
If user deactivates the task after some time agent is going save the checkpoint upto which it has read when user again activate it agent will not start from the beginning it will start after the checkpoint. User can see the line number upto which data has been read when user deactivates the task
How to understand the task is complete
Once it is complete the agent state will be displayed as complete and total number of lines it has read.
Once the task is complete can user run it again
If user wants to run a completed task then user needs to delete the checkpoint. Then the task will execute from the beginning
What is the latest checkpoint
When the task is active and processing data user can load checkpoint to see the progress because it displays the current position of the agent in the source while processing.