Splunk Day-1a

Upload Tutorial Data

Download and Upload Data and Create Index

Searching

index="tutorial_data"
index="tutorial_data" clientip="*"
index="tutorial_data" sourcetype="vendor_sales"
index="tutorial_data” 5036
index="tutorial_data" 5036 Code
index="tutorial_data" 5036 AND Code
index="tutorial_data" 5036 OR Code
index="tutorial_data" 5036 NOT Failed
index="tutorial_data" sourcetype="access*" action="purchase" status=200

Regex Based Filtering

index="tutorialdata" sourcetype="accesscombined_wcookie" | regex _raw!="Apple"

The regex command in Splunk is used to filter events or extract data based on specific patterns. It's a powerful tool for working with textual data and finding patterns using regular expressions. Regular expressions (regex) allow you to define a search pattern, which is very useful for extracting, matching, or excluding specific values in log files or event data.

Here are various examples and use cases of the regex command:

1. Basic Filtering with regex

You can use the regex command to filter events based on a specific pattern. Only events matching the regex pattern will be returned.

Example (Filter Events by IP Address):

index=web sourcetype=access_logs | regex ip="^192\.168\."

2. Filtering Events with Exclusion

You can exclude events that match a certain pattern using the !~ operator within the regex command.

Example (Exclude IP Addresses from a Specific Subnet):

index=web sourcetype=access_logs | regex ip!="^192\.168\."

Head/Tail

index="tutorial_data" VendorID=* | head
index="tutorial_data" VendorID=* | tail

Exercise-1: Top/Rare

index="tutorial_data" sourcetype="vendor_sales" | top VendorID
index="tutorial_data" sourcetype="vendor_sales" | top 5 VendorID
index="tutorial_data" sourcetype="vendor_sales" | rare VendorID
index="tutorial_data" sourcetype="vendor_sales" | rare limit=5 VendorID

Exercise-2:

index="tutorial_data" sourcetype="access_*"
index="tutorial_data" sourcetype="access_*" status=200 action=purchase

Exercise-3: Stat Sum

index="tutorial_data" sourcetype="access_*" | top limit=100 referer_domain
index="tutorial_data" sourcetype="access_*" | top limit=100 referer_domain | stats sum(count)
index="tutorial_data" sourcetype="access_*" | stats count by status, host

Exercise-4: Table

index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip
index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip | table clientip
index="tutorial_data" sourcetype="access_*" action=purchase status=200 | top limit=1 clientip showperc=false showcount=false

Exercise-5: Top Buyer/Subsearch

index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP we got from above> ===> will give me the purchases this customer has made
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count(productId)
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count by productId
OR
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count, values(productId)
index="tutorial_data" sourcetype="access_*"  status=200  action=purchase [search index="tutorial_data" sourcetype="access_*"  status=200  action=purchase | top 1 clientip showperc=false showcount=false] | stats count by productId
OR
index="tutorial_data" sourcetype="access_*" status=200 action=purchase [search index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip | table clientip] | stats count by productId

Exercise-6: Eval

index="tutorial_data" sourcetype="access_*" | eval error=if(status == 200, "OK", "Problem")
index=web-uf_index status=200 | eval A=status+100
index=web-uf_index status=200 | eval A=status*100
index="tutorial_data" VendorID=* | eval NEW_FIELD=VendorID+"_"+Code

Exercise-6: Chart

index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views"
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" by productId
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" by productId, action
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" count(eval(action="addtocart")) as "addtocart" by productId
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" count(eval(action="addtocart")) as "addtocart" count(eval(action="purchase")) as "purchase" by productId

Exercise-7: TimeChart

index="tutorial_data" sourcetype="access_*" action=purchase | timechart span=1d count by categoryId
index="tutorial_data" sourcetype="access_*" action=purchase | timechart span=1d count by categoryId usenull=false

Exercise-8: Rex (Regex to extract fields on the fly)

Regex Example: Extract Browser Name from Access lOg
index="tutorial_data" sourcetype="access_combined_wcookie" | rex field=useragent "(?<browser>[a-zA-Z]+)/"


from (?P<IP>\d+\.\d+\.\d+\.\d+)

(?P<datetime_usSREE>GET) 
(?P<datetime_usSREE>”GET) 
(?P<datetime_usSREE>\[.+\]) ==> [21/Aug/2024:18:22:16]
"(?P<datetime_usSREE>.+)” ==> GET /oldlink?itemId=EST-14&JSESSIONID=S

\[(?P<datetime_usSREE>.+)] => 21/Aug/2024:18:22:16

? = NOT GREEDY
\[(?P<datetime_usSREE>.+)].*(?P<urlSree>http.*?)”\s


CHATGPT
^(?P<client_ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<http_method>\w+) (?P<http_uri>[^"]+)" (?P<http_status_code>\d+) \d+ "[^"]*" "(?P<http_user_agent>[^"]+)" (?P<http_response_time>\d+)



index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=EXISTING-FIELD "(?<browser>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent "(?<browser>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent "(?<useragent>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" | rex field=useragent "(?<BROWSER>[a-zA-Z]+)/"

index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent mode=sed "s/^M/N/g"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent mode=sed "s/Mozilla/Godzilla/g"

index="tutorial_data" sourcetype="access_combined_wcookie" | regex _raw="Mozilla"

3. Extracting Data with rex

If you want to extract data from events using regex, use the rex command, which allows for field extraction.

Example (Extract Domain from URL):

index=web sourcetype=access_logs | rex field=url "(?<domain>https?://[^/]+)"

4. Case-Insensitive Matching

You can perform case-insensitive matches by adding the (?i) flag to the regex pattern.

Example (Case-Insensitive Match for HTTP Methods):

index=web sourcetype=access_logs | regex method="(?i)post"

5. Using Multiple Patterns

You can use multiple patterns in your regex to filter events that match any of the patterns.

Example (Match Multiple Status Codes):

index=web sourcetype=access_logs | regex status_code="(200|404|500)"

6. Extracting Multiple Fields with rex

You can extract multiple fields from a single event using rex with multiple capture groups.

Example (Extract IP and Port from a Log Entry):

index=web sourcetype=firewall_logs | rex field=_raw "(?<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?<src_port>\d+)"

7. Matching Entire Event (regex _raw)

You can apply a regex pattern to the entire event (_raw) to filter based on raw event data.

Example (Filter Events with URLs Containing ".com"):

index=web sourcetype=access_logs | regex _raw="\.com"

8. Extracting Strings with Fixed Patterns

You can use rex to extract fixed-length or patterned strings from events.

Example (Extract Dates from Logs):

index=web sourcetype=access_logs | rex field=_raw "(?<date>\d{4}-\d{2}-\d{2})"

9. Replacing Text with eval and replace()

If you need to replace part of a field’s value based on a regex pattern, you can use eval with the replace() function.

Example (Anonymize IP Addresses):

index=web sourcetype=access_logs | eval ip=replace(ip, "\d{1,3}\.\d{1,3}\.\d{1,3}\.(\d{1,3})", "xxx.xxx.xxx.xxx")

10. Extracting Field Values Using Backreferences in rex

You can use backreferences (like \1, \2, etc.) to refer to previously matched groups in your regex pattern.

Example (Extract Usernames from Email Addresses):

index=web sourcetype=email_logs | rex field=email "(?<username>[^@]+)@"

11. Using regex to Filter by Number Range

You can use regex to filter numeric fields based on a range by crafting specific patterns.

Example (Filter IPs in 192.168.x.x Subnet):

index=web sourcetype=access_logs | regex ip="^192\.168\.\d{1,3}\.\d{1,3}$"

Common regex Patterns and Use Cases:

Pattern Description
\d Matches any digit (0-9).
\w Matches any word character (alphanumeric + underscore).
. Matches any character except newline.
^ Matches the start of the string.
$ Matches the end of the string.
\. Matches a literal period (dot).
.* Matches any sequence of characters (wildcard).
\d{1,3} Matches a number between 1 and 3 digits.
[A-Za-z0-9] Matches any alphanumeric character.
(?<fieldname>...) Named capture group to extract a specific field.
(value1|value2|value3) Matches any of the values inside the parentheses (OR operator).

Summary:

By mastering the regex command and its functions, you can perform advanced pattern matching, data extraction, and filtering in Splunk searches, making it easier to focus on key data points within your logs.

FIELD EXTRACTION

The data you provided has a slightly different format compared to the regular expression we discussed earlier. I’ll modify the regular expression to match the provided log entry format and then verify it.

Log Entry:

190.113.128.150 - - [17/Aug/2024:23:59:01] "POST /cart/success.do?JSESSIONID=SD8SL1FF2ADFF27563 HTTP 1.1" 200 2392 "http://www.buttercupgames.com/cart.do?action=purchase&itemId=EST-16" "Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3" 228

Adjusted Regular Expression:

Here’s a regex to match and extract the necessary fields from this specific log format:

^(?P<client_ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<http_method>\w+) (?P<http_uri>[^"]+)" (?P<http_status_code>\d+) \d+ "[^"]*" "(?P<http_user_agent>[^"]+)" (?P<http_response_time>\d+)

Explanation:

  1. ^(?P<client_ip>\d+\.\d+\.\d+\.\d+): Matches the client IP address at the start of the line.
  2. - - \[(?P<datetime>[^\]]+)\]: Matches the datetime field within square brackets.
  3. "(?P<http_method>\w+) (?P<http_uri>[^"]+)": Matches the HTTP method and URI.
  4. (?P<http_status_code>\d+) \d+: Matches the HTTP status code and skips the byte size field.
  5. "[^"]*": Skips the referrer URL (not captured).
  6. "(?P<http_user_agent>[^"]+)": Matches the user agent string.
  7. (?P<http_response_time>\d+): Matches the response time at the end.

Verification with the Provided Data:

Applying the regular expression to the log entry:

This regular expression should successfully capture all relevant fields from the provided log entry in Splunk.

CLIENT IP FIELD EXTRACTION
^(?P<clientIP>\d+\.\d+\.\d+\d+)

EXTRACT-MYIP    Inline  ^(?P<MYIP>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) 

clientIP    Inline  ^(?P<clientIP>\d+\.\d+\.\d+\d+) 

secure-2 : EXTRACT-ACCESS_IP    Inline  from (?P<ACCESS_IP>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}) 



access_combined_wcookie : EXTRACT-MYIP  Inline  ^(?P<MYIP>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})   
admin
    
search
    Private | Permissions   Enabled     Move | Delete
access_combined_wcookie : EXTRACT-clientIP  Inline  ^(?P<clientIP>\d+\.\d+\.\d+\d+)     
admin
    
search
    Private | Permissions   Enabled     Move | Delete
secure-2 : EXTRACT-ACCESS_IP    Inline  from (?P<ACCESS_IP>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}) 


splunk_web_service : EXTRACT-useragent  Inline  userAgent=(?P<browser>[^ (]+)