Regex for Log Parsing: Extract Timestamps, IPs & Codes (2026)
Learn regex for log parsing: copy-ready patterns to extract timestamps, IPv4 addresses, and HTTP status codes from access logs, with named capture groups.
Regex for Log Parsing: Extract Timestamps, IPs & Codes
Logs are dense, semi-structured text, and split(' ') breaks the moment a field contains a space or a quoted string. Regex is the right tool: one pattern can pull the timestamp, client IP, HTTP method, and status code out of a single access-log line in one pass. This guide gives you copy-ready patterns for each field, plus a single named-group regex that parses a full combined-format line.
TL;DR
- Use regex, not
split(), because log fields contain spaces and quotes. - Match ISO 8601 timestamps with
\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}. - Validate IPv4 octets with
25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|\d, not\d{1,3}. - Capture HTTP codes with
\b[1-5]\d{2}\bnear the request. - Named groups
(?<name>...)make the whole line self-documenting.
Why parse logs with regex instead of splitting on spaces
The first instinct is to split each line on whitespace and index into the array. It works on toy examples and fails in production within an hour.
The quoting problem
The Apache combined format wraps the request line and user agent in double quotes: "GET /api?q=a b HTTP/1.1". That single field contains spaces, so a naive split shatters it into three or four pieces and every index after it shifts. Regex sidesteps the problem by matching the quotes explicitly with "([^"]*)" and treating everything between them as one unit.
When split() falls apart
Real log lines mix delimiters: spaces between top-level fields, colons inside timestamps, brackets around the date, and quotes around free text. A delimiter-based parser needs a different rule for each, which is just a regex written badly. One well-formed pattern handles all of them at once.
Regex as a single source of truth
When the pattern lives in one place, changing the log format means editing one string. You paste a sample line and the pattern into a browser-based regex tester and watch the groups light up before you wire it into a script. Because the match runs locally in JavaScript, log lines with IPs and tokens never leave your machine — which matters, since access logs frequently count as personal data under privacy rules and should not be pasted into a random server-side tool.
There is also a performance angle. A single anchored regex scans each line once, while a chain of splits, slices, and conditionals walks the same string several times. On a multi-gigabyte log, the difference between one pass and five is the difference between a query that finishes and one you abandon.
How to extract a timestamp from a log line with regex
Timestamps are the field most worth getting right, because nearly every downstream query filters on time.
ISO 8601 / RFC 3339 timestamps
Most modern services log in ISO 8601, standardised by ISO and profiled for the internet by RFC 3339 as 1994-11-05T13:15:30Z. The T separates date from time and the trailing Z (Zulu) means UTC. A pattern that covers the common variants:
(?<ts>\d{4}-\d{2}-\d{2}
[T ]\d{2}:\d{2}:\d{2}
(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})?)
Read as one line, that is a date, a T or space, a time, an optional fractional second, and an optional Z or ±hh:mm offset. The offset matters: the same instant logged in +00:00 and -05:00 is identical, so you normalise to UTC before comparing.
Apache / Common Log Format timestamps
Apache's default does not use ISO 8601. It writes [10/Oct/2024:13:55:36 +0000] with a dd/Mon/yyyy date inside square brackets. Match it with:
\[(\d{2})/(\w{3})/(\d{4}):
(\d{2}:\d{2}:\d{2})\s
([+-]\d{4})\]
The month is a three-letter abbreviation, so you map Oct to 10 in code rather than in the pattern.
Syslog's headerless timestamp
Classic syslog is harder: Oct 10 13:55:36 has no year and pads single-digit days with a space (Oct 9). Use \w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2} and supply the year from context. Once extracted, you can convert any of these to an epoch value with a Unix timestamp converter to make ranges sortable as plain integers.
How to match an IP address in a log file with regex
The client IP is usually the first field, but matching it correctly is where most patterns quietly go wrong.
The naive pattern that accepts 999.999.999.999
The everywhere-copied pattern is:
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
It matches the shape of an IPv4 address and nothing more. It happily accepts 999.999.999.999 and 300.1.1.1, neither of which is a real address. For grepping a log you already trust, shape-matching is fine and fast. For validation, it is wrong.
A correct IPv4 octet pattern
To enforce the 0–255 range, match each octet with an alternation that spells out the valid ranges:
(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|\d)
Each branch covers a slice: 25[0-5] for 250–255, 2[0-4]\d for 200–249, 1\d{2} for 100–199, [1-9]\d for 10–99, and \d for 0–9. The full address repeats it four times:
\b(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|\d)
(?:\.(?:25[0-5]|2[0-4]\d|1\d{2}
|[1-9]\d|\d)){3}\b
The leading [1-9]\d branch also rejects leading zeros, so 01.02.03.04 will not match.
A note on IPv6
IPv6 is a different problem: eight hex groups with :: compression that can appear once anywhere. A correct IPv6 regex is long and error-prone, so for IPv6-heavy logs it is usually better to match a loose [0-9a-fA-F:]+ candidate and validate it with a real parser. Do not hand-write a "perfect" IPv6 regex under deadline.
How to extract HTTP status and error codes with regex
Status codes are how you find the failures, so this field drives most ad-hoc log queries.
Matching 4xx and 5xx errors
An HTTP status is always three digits starting 1–5. To find only errors, anchor on the leading digit:
\b[45]\d{2}\b
That matches 400 through 599 and ignores 200, 301, and stray three-digit numbers like byte counts when combined with position context.
Capturing the status into a named group
In a full line you want the status tied to a name, not a magic index. Named capturing groups, written (?<name>...), have been supported across all major browsers since July 2020 per MDN, and you read the result off the groups object:
const m = line.match(re);
if (m) {
console.log(m.groups.status); // "404"
console.log(m.groups.ip); // "203.0.113.7"
}
Application log levels: ERROR, WARN, INFO
App logs use words, not numbers. A case-tolerant level matcher:
\b(?<level>ERROR|WARN(?:ING)?
|INFO|DEBUG|TRACE|FATAL)\b
Anchoring with \b stops INFORMATION from matching as INFO. Pair the level with the timestamp group and you can slice a log down to "every ERROR after 14:00" with two captures.
Putting it together: one regex for a full access-log line
Individual patterns are useful, but the real win is one regex that names every field of a line at once.
Named groups for every field
For an Apache/nginx combined line, a readable pattern looks like this:
^(?<ip>\S+) \S+ \S+
\[(?<ts>[^\]]+)\]
"(?<method>\S+) (?<path>\S+)
[^"]*" (?<status>\d{3})
(?<bytes>\d+|-)
Each field is a named group, so the extraction reads like the format spec instead of a row of numbered indexes.
Apache combined vs nginx combined
The two servers log almost the same line, which is why one pattern fits both. The field order is identical; only the source tokens differ. Apache's mod_log_config and nginx's ngx_http_log_module define them:
| Field | Apache token | nginx variable |
|---|---|---|
| Client IP | %h |
$remote_addr |
| Timestamp | %t |
$time_local |
| Request line | %r |
$request |
| Status code | %>s |
$status |
| Bytes sent | %b |
$body_bytes_sent |
Because both default to the same field sequence, the named-group pattern above parses either without changes.
Testing before you deploy
Never ship a log regex you have not run against real lines. A quick checklist before it goes into a script:
- Test against a normal line, an error line, and a malformed line.
- Confirm every named group captures the expected substring.
- Check that an empty field (a
-for bytes) still matches. - Verify the timestamp branch covers your server's exact format.
Paste a handful of real lines into the regex tester and confirm each group before automating. Once fields are clean, you can pipe them into structured output — a CSV ↔ JSON converter turns the extracted rows into JSON for a dashboard without a server round-trip.
References
- Named capturing group - JavaScript | MDN — confirmed
(?<name>...)syntax, thegroupsaccessor, and cross-browser support since July 2020. - RFC 3339 — Date and Time on the Internet: Timestamps — internet profile of ISO 8601 used for the timestamp pattern and the
Z/ offset rules. - Apache Module mod_log_config — source of the combined-log field tokens (
%h,%t,%r,%>s,%b). - Module ngx_http_log_module — nginx combined-format variables used in the field comparison table.
Related on iKit
- Keep this regex cheatsheet open while you build log patterns — the character classes and quantifiers behind every pattern in this article.
- Capture groups are how you pull fields out of a matched line — deeper on
$1,$&, and(?<name>)groups used here. - Test a log pattern in 30 seconds before scripting it — the paste-and-check workflow for validating these regexes.
- Matching URLs has the same edge cases as matching log paths — query strings and encoded characters that trip up request-line matching.
- Lookahead and lookbehind help when a field's boundary is tricky — useful for status codes wedged between other numbers.
- Your log regex may behave differently in Python vs JavaScript — flavor differences that matter when the same pattern runs in two tools.
- Email patterns show why strict validation regexes get long — the same strict-vs-loose tradeoff you face with IP octets.
- Once you've extracted a timestamp, turn it into an epoch number — reading the 10-digit timestamps that show up in many log lines.
Related posts
Validate Phone Numbers with Regex Across Countries (2026)
How to validate phone numbers with regex across country formats: the E.164 pattern that always works, US patterns, and where regex stops working.
Convert HEIC to JPG, PNG & WebP in the Browser (2026)
Chrome and Firefox can't open HEIC. Here's how to convert HEIC to JPG, PNG, or WebP entirely in your browser — no upload, no app, EXIF under your control.
Serve AVIF, WebP & JPG With One <picture> Tag (2026)
Use one <picture> tag to serve AVIF, WebP, and JPG so every browser gets the smallest image it can decode. Syntax, fallback order, and the mistakes to avoid in 2026.