Splunk Day-1a

Upload Tutorial Data

Download and Upload Data and Create Index

Searching

index="tutorial_data"
index="tutorial_data" clientip="*"
index="tutorial_data" sourcetype="vendor_sales"
index="tutorial_data” 5036
index="tutorial_data" 5036 Code
index="tutorial_data" 5036 AND Code
index="tutorial_data" 5036 OR Code
index="tutorial_data" 5036 NOT Failed
index="tutorial_data" sourcetype="access*" action="purchase" status=200

Regex Based Filtering

index="tutorialdata" sourcetype="accesscombined_wcookie" | regex _raw!="Apple"

The regex command in Splunk is used to filter events or extract data based on specific patterns. It's a powerful tool for working with textual data and finding patterns using regular expressions. Regular expressions (regex) allow you to define a search pattern, which is very useful for extracting, matching, or excluding specific values in log files or event data.

Here are various examples and use cases of the regex command:

1. Basic Filtering with `regex`

You can use the regex command to filter events based on a specific pattern. Only events matching the regex pattern will be returned.

Example (Filter Events by IP Address):

index=web sourcetype=access_logs | regex ip="^192\.168\."

Description: This command filters the events where the ip field starts with 192.168.. The ^ indicates the start of the string, and \. escapes the dot character to match a literal period.

2. Filtering Events with Exclusion

You can exclude events that match a certain pattern using the !~ operator within the regex command.

Example (Exclude IP Addresses from a Specific Subnet):

index=web sourcetype=access_logs | regex ip!="^192\.168\."

Description: This command excludes events where the ip field starts with 192.168..

Head/Tail

index="tutorial_data" VendorID=* | head
index="tutorial_data" VendorID=* | tail

Exercise-1: Top/Rare

index="tutorial_data" sourcetype="vendor_sales" | top VendorID
index="tutorial_data" sourcetype="vendor_sales" | top 5 VendorID
index="tutorial_data" sourcetype="vendor_sales" | rare VendorID
index="tutorial_data" sourcetype="vendor_sales" | rare limit=5 VendorID

Exercise-2:

index="tutorial_data" sourcetype="access_*"
index="tutorial_data" sourcetype="access_*" status=200 action=purchase

Exercise-3: Stat Sum

index="tutorial_data" sourcetype="access_*" | top limit=100 referer_domain
index="tutorial_data" sourcetype="access_*" | top limit=100 referer_domain | stats sum(count)
index="tutorial_data" sourcetype="access_*" | stats count by status, host

Exercise-4: Table

index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip
index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip | table clientip
index="tutorial_data" sourcetype="access_*" action=purchase status=200 | top limit=1 clientip showperc=false showcount=false

Exercise-5: Top Buyer/Subsearch

index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP we got from above> ===> will give me the purchases this customer has made
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count(productId)
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count by productId
OR
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count, values(productId)
index="tutorial_data" sourcetype="access_*"  status=200  action=purchase [search index="tutorial_data" sourcetype="access_*"  status=200  action=purchase | top 1 clientip showperc=false showcount=false] | stats count by productId
OR
index="tutorial_data" sourcetype="access_*" status=200 action=purchase [search index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip | table clientip] | stats count by productId

Exercise-6: Eval

index="tutorial_data" sourcetype="access_*" | eval error=if(status == 200, "OK", "Problem")
index=web-uf_index status=200 | eval A=status+100
index=web-uf_index status=200 | eval A=status*100
index="tutorial_data" VendorID=* | eval NEW_FIELD=VendorID+"_"+Code

Exercise-6: Chart

index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views"
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" by productId
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" by productId, action
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" count(eval(action="addtocart")) as "addtocart" by productId
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" count(eval(action="addtocart")) as "addtocart" count(eval(action="purchase")) as "purchase" by productId

Exercise-7: TimeChart

index="tutorial_data" sourcetype="access_*" action=purchase | timechart span=1d count by categoryId
index="tutorial_data" sourcetype="access_*" action=purchase | timechart span=1d count by categoryId usenull=false

Exercise-8: Rex (Regex to extract fields on the fly)

Regex Example: Extract Browser Name from Access lOg
index="tutorial_data" sourcetype="access_combined_wcookie" | rex field=useragent "(?<browser>[a-zA-Z]+)/"


from (?P<IP>\d+\.\d+\.\d+\.\d+)

(?P<datetime_usSREE>GET) 
(?P<datetime_usSREE>”GET) 
(?P<datetime_usSREE>\[.+\]) ==> [21/Aug/2024:18:22:16]
"(?P<datetime_usSREE>.+)” ==> GET /oldlink?itemId=EST-14&JSESSIONID=S

\[(?P<datetime_usSREE>.+)] => 21/Aug/2024:18:22:16

? = NOT GREEDY
\[(?P<datetime_usSREE>.+)].*(?P<urlSree>http.*?)”\s


CHATGPT
^(?P<client_ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<http_method>\w+) (?P<http_uri>[^"]+)" (?P<http_status_code>\d+) \d+ "[^"]*" "(?P<http_user_agent>[^"]+)" (?P<http_response_time>\d+)



index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=EXISTING-FIELD "(?<browser>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent "(?<browser>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent "(?<useragent>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" | rex field=useragent "(?<BROWSER>[a-zA-Z]+)/"

index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent mode=sed "s/^M/N/g"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent mode=sed "s/Mozilla/Godzilla/g"

index="tutorial_data" sourcetype="access_combined_wcookie" | regex _raw="Mozilla"

3. Extracting Data with `rex`

If you want to extract data from events using regex, use the rex command, which allows for field extraction.

Example (Extract Domain from URL):

index=web sourcetype=access_logs | rex field=url "(?<domain>https?://[^/]+)"

Description: This command extracts the domain from the url field and assigns it to a new field called domain. The (?<domain>...) syntax is used to create a named capture group.

4. Case-Insensitive Matching

You can perform case-insensitive matches by adding the (?i) flag to the regex pattern.

Example (Case-Insensitive Match for HTTP Methods):

index=web sourcetype=access_logs | regex method="(?i)post"

Description: This command filters events where the method field contains the value post, regardless of case (i.e., it matches POST, Post, or post).

5. Using Multiple Patterns

You can use multiple patterns in your regex to filter events that match any of the patterns.

Example (Match Multiple Status Codes):

index=web sourcetype=access_logs | regex status_code="(200|404|500)"

Description: This command returns events where the status_code is 200, 404, or 500.

6. Extracting Multiple Fields with `rex`

You can extract multiple fields from a single event using rex with multiple capture groups.

Example (Extract IP and Port from a Log Entry):

index=web sourcetype=firewall_logs | rex field=_raw "(?<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?<src_port>\d+)"

Description: This command extracts both the source IP (src_ip) and source port (src_port) from the raw log data. It captures an IP address format (\d{1,3} for each octet) followed by a port number.

7. Matching Entire Event (`regex _raw`)

You can apply a regex pattern to the entire event (_raw) to filter based on raw event data.

Example (Filter Events with URLs Containing ".com"):

index=web sourcetype=access_logs | regex _raw="\.com"

Description: This command filters events where the raw log data contains .com anywhere in the text.

8. Extracting Strings with Fixed Patterns

You can use rex to extract fixed-length or patterned strings from events.

Example (Extract Dates from Logs):

index=web sourcetype=access_logs | rex field=_raw "(?<date>\d{4}-\d{2}-\d{2})"

Description: This command extracts dates in the format YYYY-MM-DD from the raw log data and assigns them to a new field called date.

9. Replacing Text with `eval` and `replace()`

If you need to replace part of a field’s value based on a regex pattern, you can use eval with the replace() function.

Example (Anonymize IP Addresses):

index=web sourcetype=access_logs | eval ip=replace(ip, "\d{1,3}\.\d{1,3}\.\d{1,3}\.(\d{1,3})", "xxx.xxx.xxx.xxx")

Description: This command replaces the last octet of the ip field with xxx to anonymize the IP addresses in the logs.

10. Extracting Field Values Using Backreferences in `rex`

You can use backreferences (like \1, \2, etc.) to refer to previously matched groups in your regex pattern.

Example (Extract Usernames from Email Addresses):

index=web sourcetype=email_logs | rex field=email "(?<username>[^@]+)@"

Description: This command extracts the part of the email address before the @ symbol and assigns it to the username field.

11. Using `regex` to Filter by Number Range

You can use regex to filter numeric fields based on a range by crafting specific patterns.

Example (Filter IPs in 192.168.x.x Subnet):

index=web sourcetype=access_logs | regex ip="^192\.168\.\d{1,3}\.\d{1,3}$"

Description: This command filters the logs for IP addresses within the 192.168.x.x range.

Common `regex` Patterns and Use Cases:

Pattern	Description
`\d`	Matches any digit (`0-9`).
`\w`	Matches any word character (alphanumeric + underscore).
`.`	Matches any character except newline.
`^`	Matches the start of the string.
`$`	Matches the end of the string.
`\.`	Matches a literal period (dot).
`.*`	Matches any sequence of characters (wildcard).
`\d{1,3}`	Matches a number between 1 and 3 digits.
`[A-Za-z0-9]`	Matches any alphanumeric character.
`(?<fieldname>...)`	Named capture group to extract a specific field.
`(value1\|value2\|value3)`	Matches any of the values inside the parentheses (OR operator).

Summary:

Basic Filtering: Use regex to filter events based on a specific pattern in fields.
Exclusion: Use regex with !~ to exclude events based on a pattern.
Field Extraction: Use rex to extract parts of a field using regular expressions.
String Manipulation: Use regex patterns with eval and replace() to modify or anonymize data.
Named Capture Groups: Use named groups in rex to extract specific parts of a string and create new fields.

By mastering the regex command and its functions, you can perform advanced pattern matching, data extraction, and filtering in Splunk searches, making it easier to focus on key data points within your logs.

FIELD EXTRACTION

The data you provided has a slightly different format compared to the regular expression we discussed earlier. I’ll modify the regular expression to match the provided log entry format and then verify it.

Log Entry:

190.113.128.150 - - [17/Aug/2024:23:59:01] "POST /cart/success.do?JSESSIONID=SD8SL1FF2ADFF27563 HTTP 1.1" 200 2392 "http://www.buttercupgames.com/cart.do?action=purchase&itemId=EST-16" "Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3" 228

Adjusted Regular Expression:

Here’s a regex to match and extract the necessary fields from this specific log format:

^(?P<client_ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<http_method>\w+) (?P<http_uri>[^"]+)" (?P<http_status_code>\d+) \d+ "[^"]*" "(?P<http_user_agent>[^"]+)" (?P<http_response_time>\d+)

Explanation:

^(?P<client_ip>\d+\.\d+\.\d+\.\d+): Matches the client IP address at the start of the line.
- - \[(?P<datetime>[^\]]+)\]: Matches the datetime field within square brackets.
"(?P<http_method>\w+) (?P<http_uri>[^"]+)": Matches the HTTP method and URI.
(?P<http_status_code>\d+) \d+: Matches the HTTP status code and skips the byte size field.
"[^"]*": Skips the referrer URL (not captured).
"(?P<http_user_agent>[^"]+)": Matches the user agent string.
(?P<http_response_time>\d+): Matches the response time at the end.

Verification with the Provided Data:

Applying the regular expression to the log entry:

client_ip: 190.113.128.150
datetime: 17/Aug/2024:23:59:01
http_method: POST
http_uri: /cart/success.do?JSESSIONID=SD8SL1FF2ADFF27563
httpstatuscode: 200
httpuseragent: Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3
httpresponsetime: 228

This regular expression should successfully capture all relevant fields from the provided log entry in Splunk.

CLIENT IP FIELD EXTRACTION
^(?P<clientIP>\d+\.\d+\.\d+\d+)

EXTRACT-MYIP    Inline  ^(?P<MYIP>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) 

clientIP    Inline  ^(?P<clientIP>\d+\.\d+\.\d+\d+) 

secure-2 : EXTRACT-ACCESS_IP    Inline  from (?P<ACCESS_IP>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}) 



access_combined_wcookie : EXTRACT-MYIP  Inline  ^(?P<MYIP>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})   
admin
    
search
    Private | Permissions   Enabled     Move | Delete
access_combined_wcookie : EXTRACT-clientIP  Inline  ^(?P<clientIP>\d+\.\d+\.\d+\d+)     
admin
    
search
    Private | Permissions   Enabled     Move | Delete
secure-2 : EXTRACT-ACCESS_IP    Inline  from (?P<ACCESS_IP>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}) 


splunk_web_service : EXTRACT-useragent  Inline  userAgent=(?P<browser>[^ (]+)