Download and Upload Data and Create Index
index="tutorial_data"
index="tutorial_data" clientip="*"
index="tutorial_data" sourcetype="vendor_sales"
index="tutorial_data” 5036
index="tutorial_data" 5036 Code
index="tutorial_data" 5036 AND Code
index="tutorial_data" 5036 OR Code
index="tutorial_data" 5036 NOT Failed
index="tutorial_data" sourcetype="access*" action="purchase" status=200
index="tutorialdata" sourcetype="accesscombined_wcookie" | regex _raw!="Apple"
The regex command in Splunk is used to filter events or extract data based on specific patterns. It's a powerful tool for working with textual data and finding patterns using regular expressions. Regular expressions (regex) allow you to define a search pattern, which is very useful for extracting, matching, or excluding specific values in log files or event data.
Here are various examples and use cases of the regex command:
regexYou can use the regex command to filter events based on a specific pattern. Only events matching the regex pattern will be returned.
index=web sourcetype=access_logs | regex ip="^192\.168\."ip field starts with 192.168.. The ^ indicates the start of the string, and \. escapes the dot character to match a literal period.You can exclude events that match a certain pattern using the !~ operator within the regex command.
index=web sourcetype=access_logs | regex ip!="^192\.168\."ip field starts with 192.168..index="tutorial_data" VendorID=* | head
index="tutorial_data" VendorID=* | tailindex="tutorial_data" sourcetype="vendor_sales" | top VendorID
index="tutorial_data" sourcetype="vendor_sales" | top 5 VendorID
index="tutorial_data" sourcetype="vendor_sales" | rare VendorID
index="tutorial_data" sourcetype="vendor_sales" | rare limit=5 VendorIDindex="tutorial_data" sourcetype="access_*"
index="tutorial_data" sourcetype="access_*" status=200 action=purchaseindex="tutorial_data" sourcetype="access_*" | top limit=100 referer_domain
index="tutorial_data" sourcetype="access_*" | top limit=100 referer_domain | stats sum(count)
index="tutorial_data" sourcetype="access_*" | stats count by status, hostindex="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip
index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip | table clientip
index="tutorial_data" sourcetype="access_*" action=purchase status=200 | top limit=1 clientip showperc=false showcount=false
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP we got from above> ===> will give me the purchases this customer has made
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count(productId)
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count by productId
OR
index="tutorial_data" sourcetype="access_*" status=200 action=purchase clientip=<IP> | stats count, values(productId)
index="tutorial_data" sourcetype="access_*" status=200 action=purchase [search index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top 1 clientip showperc=false showcount=false] | stats count by productId
OR
index="tutorial_data" sourcetype="access_*" status=200 action=purchase [search index="tutorial_data" sourcetype="access_*" status=200 action=purchase | top limit=1 clientip | table clientip] | stats count by productIdindex="tutorial_data" sourcetype="access_*" | eval error=if(status == 200, "OK", "Problem")
index=web-uf_index status=200 | eval A=status+100
index=web-uf_index status=200 | eval A=status*100
index="tutorial_data" VendorID=* | eval NEW_FIELD=VendorID+"_"+Codeindex="tutorial_data" sourcetype="access_*" status=200 | chart count as "views"
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" by productId
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" by productId, action
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" count(eval(action="addtocart")) as "addtocart" by productId
index="tutorial_data" sourcetype="access_*" status=200 | chart count as "views" count(eval(action="addtocart")) as "addtocart" count(eval(action="purchase")) as "purchase" by productId
index="tutorial_data" sourcetype="access_*" action=purchase | timechart span=1d count by categoryId
index="tutorial_data" sourcetype="access_*" action=purchase | timechart span=1d count by categoryId usenull=false
Regex Example: Extract Browser Name from Access lOg
index="tutorial_data" sourcetype="access_combined_wcookie" | rex field=useragent "(?<browser>[a-zA-Z]+)/"
from (?P<IP>\d+\.\d+\.\d+\.\d+)
(?P<datetime_usSREE>GET)
(?P<datetime_usSREE>”GET)
(?P<datetime_usSREE>\[.+\]) ==> [21/Aug/2024:18:22:16]
"(?P<datetime_usSREE>.+)” ==> GET /oldlink?itemId=EST-14&JSESSIONID=S
\[(?P<datetime_usSREE>.+)] => 21/Aug/2024:18:22:16
? = NOT GREEDY
\[(?P<datetime_usSREE>.+)].*(?P<urlSree>http.*?)”\s
CHATGPT
^(?P<client_ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<http_method>\w+) (?P<http_uri>[^"]+)" (?P<http_status_code>\d+) \d+ "[^"]*" "(?P<http_user_agent>[^"]+)" (?P<http_response_time>\d+)
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=EXISTING-FIELD "(?<browser>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent "(?<browser>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent "(?<useragent>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" | rex field=useragent "(?<BROWSER>[a-zA-Z]+)/"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent mode=sed "s/^M/N/g"
index="tutorial_data" sourcetype="access_combined_wcookie" status=200| rex field=useragent mode=sed "s/Mozilla/Godzilla/g"
index="tutorial_data" sourcetype="access_combined_wcookie" | regex _raw="Mozilla"
rexIf you want to extract data from events using regex, use the rex command, which allows for field extraction.
index=web sourcetype=access_logs | rex field=url "(?<domain>https?://[^/]+)"url field and assigns it to a new field called domain. The (?<domain>...) syntax is used to create a named capture group.You can perform case-insensitive matches by adding the (?i) flag to the regex pattern.
index=web sourcetype=access_logs | regex method="(?i)post"method field contains the value post, regardless of case (i.e., it matches POST, Post, or post).You can use multiple patterns in your regex to filter events that match any of the patterns.
index=web sourcetype=access_logs | regex status_code="(200|404|500)"status_code is 200, 404, or 500.rexYou can extract multiple fields from a single event using rex with multiple capture groups.
index=web sourcetype=firewall_logs | rex field=_raw "(?<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?<src_port>\d+)"src_ip) and source port (src_port) from the raw log data. It captures an IP address format (\d{1,3} for each octet) followed by a port number.regex _raw)You can apply a regex pattern to the entire event (_raw) to filter based on raw event data.
index=web sourcetype=access_logs | regex _raw="\.com".com anywhere in the text.You can use rex to extract fixed-length or patterned strings from events.
index=web sourcetype=access_logs | rex field=_raw "(?<date>\d{4}-\d{2}-\d{2})"YYYY-MM-DD from the raw log data and assigns them to a new field called date.eval and replace()If you need to replace part of a field’s value based on a regex pattern, you can use eval with the replace() function.
index=web sourcetype=access_logs | eval ip=replace(ip, "\d{1,3}\.\d{1,3}\.\d{1,3}\.(\d{1,3})", "xxx.xxx.xxx.xxx")ip field with xxx to anonymize the IP addresses in the logs.rexYou can use backreferences (like \1, \2, etc.) to refer to previously matched groups in your regex pattern.
index=web sourcetype=email_logs | rex field=email "(?<username>[^@]+)@"@ symbol and assigns it to the username field.regex to Filter by Number RangeYou can use regex to filter numeric fields based on a range by crafting specific patterns.
index=web sourcetype=access_logs | regex ip="^192\.168\.\d{1,3}\.\d{1,3}$"192.168.x.x range.regex Patterns and Use Cases:| Pattern | Description |
|---|---|
\d |
Matches any digit (0-9). |
\w |
Matches any word character (alphanumeric + underscore). |
. |
Matches any character except newline. |
^ |
Matches the start of the string. |
$ |
Matches the end of the string. |
\. |
Matches a literal period (dot). |
.* |
Matches any sequence of characters (wildcard). |
\d{1,3} |
Matches a number between 1 and 3 digits. |
[A-Za-z0-9] |
Matches any alphanumeric character. |
(?<fieldname>...) |
Named capture group to extract a specific field. |
(value1|value2|value3) |
Matches any of the values inside the parentheses (OR operator). |
regex to filter events based on a specific pattern in fields.regex with !~ to exclude events based on a pattern.rex to extract parts of a field using regular expressions.eval and replace() to modify or anonymize data.rex to extract specific parts of a string and create new fields.By mastering the regex command and its functions, you can perform advanced pattern matching, data extraction, and filtering in Splunk searches, making it easier to focus on key data points within your logs.
The data you provided has a slightly different format compared to the regular expression we discussed earlier. I’ll modify the regular expression to match the provided log entry format and then verify it.
190.113.128.150 - - [17/Aug/2024:23:59:01] "POST /cart/success.do?JSESSIONID=SD8SL1FF2ADFF27563 HTTP 1.1" 200 2392 "http://www.buttercupgames.com/cart.do?action=purchase&itemId=EST-16" "Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3" 228Here’s a regex to match and extract the necessary fields from this specific log format:
^(?P<client_ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<http_method>\w+) (?P<http_uri>[^"]+)" (?P<http_status_code>\d+) \d+ "[^"]*" "(?P<http_user_agent>[^"]+)" (?P<http_response_time>\d+)^(?P<client_ip>\d+\.\d+\.\d+\.\d+): Matches the client IP address at the start of the line.- - \[(?P<datetime>[^\]]+)\]: Matches the datetime field within square brackets."(?P<http_method>\w+) (?P<http_uri>[^"]+)": Matches the HTTP method and URI.(?P<http_status_code>\d+) \d+: Matches the HTTP status code and skips the byte size field."[^"]*": Skips the referrer URL (not captured)."(?P<http_user_agent>[^"]+)": Matches the user agent string.(?P<http_response_time>\d+): Matches the response time at the end.Applying the regular expression to the log entry:
190.113.128.15017/Aug/2024:23:59:01POST/cart/success.do?JSESSIONID=SD8SL1FF2ADFF27563200Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3228This regular expression should successfully capture all relevant fields from the provided log entry in Splunk.
CLIENT IP FIELD EXTRACTION
^(?P<clientIP>\d+\.\d+\.\d+\d+)
EXTRACT-MYIP Inline ^(?P<MYIP>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
clientIP Inline ^(?P<clientIP>\d+\.\d+\.\d+\d+)
secure-2 : EXTRACT-ACCESS_IP Inline from (?P<ACCESS_IP>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})
access_combined_wcookie : EXTRACT-MYIP Inline ^(?P<MYIP>[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})
admin
search
Private | Permissions Enabled Move | Delete
access_combined_wcookie : EXTRACT-clientIP Inline ^(?P<clientIP>\d+\.\d+\.\d+\d+)
admin
search
Private | Permissions Enabled Move | Delete
secure-2 : EXTRACT-ACCESS_IP Inline from (?P<ACCESS_IP>\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3})
splunk_web_service : EXTRACT-useragent Inline userAgent=(?P<browser>[^ (]+)