Skip to content

Parser

What is a parser?

A parser is a software component or program that analyzes and processes input data according to a specified syntax or grammar. Its primary function is to process input data based on predefined rules or patterns. Parsers are commonly used in data processing.

Avalanchio parser

Parsers are the key components for data onboard. Parses are fully managed from the UI and designed to handle many common tasks that are commonly required for pre-processing data while loading to the target system.

# Feature
1 Parsers support various file format e.g. CSV, JSON, XML, regex formats. Parses can extract relevant fields from these file formats and saved them to destination table or topic.
2 Parsers can infer the schema automatically from file format e.g. CSV, JSON, XML, regex formats.
3 There is built-in library of parsers for source formats like Window Server Logs, Window AD logs and system logs, CISCO ASA VPN, Fortinet, etc.
4 Parsers can automatically parse timestamps using various date time formats and supports time zone. Administrator of course can supply custom datetime formats based on their prior knowledge of system
5 Identification of context id such user identity information or device entity details and enrich each event data with contextual data. For example, based on user’s username, add user’s department, AD group or location to the event.
6 It has robust parsing using multiple regex patterns simultaneously. Some use cases in which lines can match a pattern from one of the possible patterns and extract different attributes from the line using different patterns.
7 For advanced enrichment and field extraction, Parsers allow script. Currently Python script and JavaScript are supported.
8 The parser supports several types of filters to drop the data that are not necessary to forward to the subsequent step
9 The output of the parsers are saved in a table or Kafka topic.

What an active parser means?

If a user actives a parser whenever a task is run it will be executed.

What happens when a parser fails?

User can deactivate a parser on failures.If email address is given in email notification field a notification will be sent when a parser fails.

Why parser format must be chosen?

The choice of parser format is important because it determines how data will be interpreted and processed by a parser, Different types of parser formats are - Text:Parsing text involves extracting structured data from unstructured or semi-structured text. This can include extracting information such as names, dates, and addresses from a block of text. - Json:JSON is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is commonly used for transmitting data between a web server and a client as an alternative to XML. - Csv:CSV is a simple file format used to store tabular data, such as a spreadsheet or database. Each line in a CSV file represents a record, and the fields in each record are separated by commas. - Xml: XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is commonly used for storing and exchanging structured data on the web. - Regex:Regex is a powerful tool for matching patterns in text. It allows you to define a search pattern, which can include wildcards, character classes, and quantifiers, to find matches within a string.

What is data source

How to put parser input?

User has to select the type of input they want to pursue with. Available input types are - Topic: - file: his typically refers to a file containing data that needs to be parsed. The parser would read the contents of the file and process it according to the specified format (e.g., JSON, CSV, XML).

Selecting output for parser?

User can choose where the output will be stored. Avalanchio offers two options- -Output to Topic:with these option processed data will be stored in a topic. -Output to Table:with this option processed data will be stored in a table user can select both the option.If user wants to store the output in table they have to put the table name. If user wants to save the output in topic they have to put the topic name.

Significance of key value filter

What is the use of sample file?

Sample file is used for an example when input type is file

What is an output topic key?

When user should add regex pattern?

If the parser format type is Regex then user has to add regex pattern details.

Regex patterns

In avalanchio parser user can specify regex pattern for data processing.

Which regex patterns will be used while data processing

If the parser format is Regex and the regex pattern is in active mode

How to set regex pattern position?

User can specify the start position where the search should begin and also the end position where the search should end

How case-sensitive,recursive,use named groups properties affect regex patterns?

How to remove characters from text?

User specifies a set of characters that should be removed from the text before applying the regex pattern matching

What is pattern in regex?

User can define a pattern or expression that defines the search pattern to be matched within the text. It consists of a sequence of characters and metacharacters that specify the rules for pattern matching.

How to provide a description on the regex pattern?

User can provide a description or explanation of the regex pattern, its purpose, and how it should be used. It helps users understand the intent and functionality of the regex pattern for pattern matching tasks.

What is infer schema?

Inferring schema refers to the process of automatically determining the structure of a dataset based on its contents. This is particularly useful when working with data that may not have a predefined schema, such as CSV files where the first row is assumed to contain column names, or JSON files where the structure may vary between records.With this feature user can view the input columns.

How to apply fine tune?

How to preview parsed data?

User can test parser to view the processed data

When output mapping is done?

If user selects output to table then user can perform mapping on processed output data fields.

What is output mapping?

with this process user can mapp the output data fields to the output table columns.

Other features output mapping offers are

# Feature Description
1 Map Automatically
2 Remove Unmapped Column
3 Remove Mapped Columns
4 Remove Selected Columns
5 Remove Unselected Columns
6 Remove All Columns
7 Clear Mapping

What is task logs?

Task logs are records that contain information about the execution of tasks.Task logs are essential for monitoring, troubleshooting, and auditing purposes, as users can track the progress and status of tasks over time. Task logs contains - Task:This field specifies the name or identifier of the task being logged. It provides a unique reference to the task being executed. - Status:The status field indicates the current state or outcome of the task execution. Common status values include "In Progress", "Completed", "Failed", "Cancelled", etc. - Type:This field describes the type or category of the task. It may indicate the purpose or nature of the task. - Created On:The timestamp when the task was created or initiated. It provides the start time of the task execution. - Completed On:The timestamp when the task was completed or finished. It indicates the end time of the task execution. - Duration:The duration field specifies the elapsed time taken to complete the task execution. It is typically represented in units such as seconds, minutes, or hours.

Who can manage parsers?