NexusCS

AWK

CLI
AWK is a powerful text processing language for pattern scanning and processing. Named after its creators Aho, Weinberger, and Kernighan.
text-processing
unix
scripting

Getting started

Introduction

AWK processes text files line by line, splitting each line into fields. It's ideal for data extraction, reporting, and transformation tasks.

Basic Syntax

awk 'pattern { action }' file.txt
# Print entire file
awk '{ print }' file.txt

# Print specific fields
awk '{ print $1, $3 }' file.txt

# With pattern matching
awk '/error/ { print $0 }' log.txt

Quick Examples

# Sum first column
awk '{ sum += $1 } END { print sum }' data.txt

# Print line numbers
awk '{ print NR, $0 }' file.txt

# Filter by condition
awk '$3 > 100 { print $1, $3 }' data.txt

# Change field separator
awk -F: '{ print $1, $7 }' /etc/passwd

Command Line Options

Option Description
-F fs Set field separator
-v var=val Set variable
-f file Read program from file
-- Signal end of options
# Custom field separator
awk -F, '{ print $1 }' data.csv

# Set variable
awk -v name="John" '$1 == name' people.txt

# Read from script file
awk -f script.awk data.txt

Built-in Variables

Field Variables

Variable Description
$0 Entire line
$1, $2, ... Field 1, field 2, etc.
$NF Last field
$(NF-1) Second to last field
# Print first and last fields
awk '{ print $1, $NF }' file.txt

# Print entire line
awk '{ print $0 }' file.txt

# Print second to last
awk '{ print $(NF-1) }' file.txt

Record Variables

Variable Description
NR Current record number
NF Number of fields
FNR Record number in current file
FILENAME Current filename
# Print line numbers
awk '{ print NR, $0 }' file.txt

# Print number of fields per line
awk '{ print NF }' file.txt

# Print filename and line
awk '{ print FILENAME, FNR, $0 }' *.txt

Separator Variables

Variable Description
FS Input field separator (default: space)
OFS Output field separator (default: space)
RS Input record separator (default: newline)
ORS Output record separator (default: newline)
# Set field separator to colon
awk 'BEGIN { FS=":" } { print $1 }' /etc/passwd

# Set output separator to comma
awk 'BEGIN { OFS="," } { print $1, $2, $3 }' data.txt

# Multiple character separator
awk 'BEGIN { FS=" : " } { print $1 }' file.txt

Patterns

Pattern Types

# Regular expression
awk '/pattern/ { print }' file.txt

# Comparison
awk '$3 > 100 { print }' data.txt

# Range pattern
awk '/start/,/end/ { print }' file.txt

# Compound pattern
awk '$1 == "error" && $3 > 5 { print }' log.txt

BEGIN and END

# BEGIN: executed before processing
awk 'BEGIN { print "Starting..." }
     { print $0 }
     END { print "Done!" }' file.txt

# Initialize variables
awk 'BEGIN { sum=0 }
     { sum+=$1 }
     END { print "Total:", sum }' data.txt

# Print header
awk 'BEGIN { print "Name\tScore" }
     { print $1, $2 }' scores.txt

Regular Expressions

# Match pattern
awk '/error|warning/ { print }' log.txt

# Case insensitive (GNU awk)
awk 'tolower($0) ~ /error/ { print }' log.txt

# Field regex match
awk '$2 ~ /^[0-9]+$/ { print }' file.txt

# Negation
awk '$1 !~ /test/ { print }' file.txt

Comparison Operators

Operator Description
== Equal
!= Not equal
< Less than
<= Less or equal
> Greater than
>= Greater or equal
~ Matches regex
!~ Does not match regex
# Numeric comparison
awk '$3 >= 100 { print $1, $3 }' data.txt

# String comparison
awk '$2 == "active" { print }' status.txt

# Regex match
awk '$1 ~ /^A/ { print }' names.txt

Actions & Functions

String Functions

Function Description
length(s) String length
substr(s,i,n) Substring from position i, length n
index(s,t) Position of t in s
split(s,a,fs) Split s into array a
tolower(s) Convert to lowercase
toupper(s) Convert to uppercase
gsub(r,s,t) Global substitute
sub(r,s,t) Substitute first match
# String length
awk '{ print length($1) }' file.txt

# Substring
awk '{ print substr($1, 1, 3) }' file.txt

# Find position
awk '{ print index($0, "error") }' log.txt

# Convert case
awk '{ print toupper($1) }' file.txt

# Replace text
awk '{ gsub(/old/, "new"); print }' file.txt

Numeric Functions

Function Description
int(x) Truncate to integer
sqrt(x) Square root
exp(x) Exponential
log(x) Natural logarithm
sin(x) Sine
cos(x) Cosine
atan2(y,x) Arctangent
rand() Random number [0,1)
srand(x) Seed random generator
# Round numbers
awk '{ print int($1 + 0.5) }' data.txt

# Square root
awk '{ print sqrt($1) }' numbers.txt

# Random numbers
awk 'BEGIN { srand(); print rand() }'

I/O Functions

# Print with newline
awk '{ print $1, $2 }' file.txt

# Print without newline
awk '{ printf "%s ", $1 } END { printf "\n" }' file.txt

# Formatted output
awk '{ printf "%-10s %5d\n", $1, $2 }' data.txt

# Print to file
awk '{ print $1 > "output.txt" }' input.txt

# Append to file
awk '{ print $1 >> "output.txt" }' input.txt

Printf Formatting

Format Description
%s String
%d Integer
%f Floating point
%e Scientific notation
%x Hexadecimal
%o Octal
%% Literal %
# Width and precision
awk '{ printf "%10s %5.2f\n", $1, $2 }' data.txt

# Left align with minus
awk '{ printf "%-10s %d\n", $1, $2 }' data.txt

# Zero padding
awk '{ printf "%05d\n", $1 }' numbers.txt

Control Structures

If-Else

# Simple if
awk '{ if ($3 > 100) print $1 }' data.txt

# If-else
awk '{
  if ($3 > 100)
    print $1, "high"
  else
    print $1, "low"
}' data.txt

# If-else-if
awk '{
  if ($3 > 100)
    status="high"
  else if ($3 > 50)
    status="medium"
  else
    status="low"
  print $1, status
}' data.txt

Loops

# For loop over fields
awk '{
  for (i=1; i<=NF; i++)
    print $i
}' file.txt

# While loop
awk '{
  i=1
  while (i<=NF) {
    print $i
    i++
  }
}' file.txt

# Do-while loop
awk '{
  i=1
  do {
    print $i
    i++
  } while (i<=NF)
}' file.txt

Loop Control

# Break
awk '{
  for (i=1; i<=NF; i++) {
    if ($i == "stop") break
    print $i
  }
}' file.txt

# Continue
awk '{
  for (i=1; i<=NF; i++) {
    if ($i == "") continue
    print $i
  }
}' file.txt

# Next (skip to next record)
awk '{
  if ($1 == "skip") next
  print $0
}' file.txt

Ternary Operator

# condition ? true_value : false_value
awk '{ print ($3 > 100 ? "high" : "low") }' data.txt

# Nested ternary
awk '{
  print ($3 > 100 ? "high" : $3 > 50 ? "medium" : "low")
}' data.txt

# With assignment
awk '{
  status = ($3 > 100 ? "PASS" : "FAIL")
  print $1, status
}' scores.txt

Arrays

Associative Arrays

# Create and access
awk '{
  count[$1]++
}
END {
  for (name in count)
    print name, count[name]
}' data.txt

# Multi-dimensional arrays
awk '{
  arr[$1,$2] = $3
}
END {
  for (key in arr)
    print key, arr[key]
}' data.txt

Array Operations

# Check if key exists
awk '{
  if ($1 in seen)
    print "Duplicate:", $1
  seen[$1] = 1
}' file.txt

# Delete element
awk '{
  arr[$1] = $2
  if ($3 == "remove")
    delete arr[$1]
}' data.txt

# Array length (GNU awk)
awk 'END { print length(array) }' file.txt

Array Iteration

# For-in loop
awk '{
  count[$1]++
}
END {
  for (key in count)
    print key, count[key]
}' data.txt

# Sorted iteration (GNU awk)
awk '{
  sum[$1] += $2
}
END {
  PROCINFO["sorted_in"] = "@ind_str_asc"
  for (key in sum)
    print key, sum[key]
}' data.txt

Common Array Patterns

# Count occurrences
awk '{ count[$1]++ }
     END { for (k in count) print k, count[k] }' file.txt

# Sum by key
awk '{ sum[$1] += $2 }
     END { for (k in sum) print k, sum[k] }' data.txt

# Collect unique values
awk '{ unique[$1] = 1 }
     END { for (k in unique) print k }' file.txt

# Find duplicates
awk '{
  if (seen[$1]++)
    print "Duplicate:", $1
}' file.txt

Common One-Liners

Statistics

# Sum column
awk '{ sum += $1 } END { print sum }' data.txt

# Average
awk '{ sum += $1; n++ } END { print sum/n }' data.txt

# Min and max
awk 'NR==1 { min=max=$1 }
     { if ($1<min) min=$1; if ($1>max) max=$1 }
     END { print min, max }' data.txt

# Count lines
awk 'END { print NR }' file.txt

# Count non-empty lines
awk 'NF > 0 { count++ } END { print count }' file.txt

Filtering

# Print lines matching pattern
awk '/error/' log.txt

# Print lines NOT matching pattern
awk '!/debug/' log.txt

# Print lines where field matches
awk '$3 > 100' data.txt

# Print specific columns
awk '{ print $1, $3 }' file.txt

# Print lines between patterns
awk '/START/,/END/' file.txt

Transformation

# Swap columns
awk '{ print $2, $1 }' file.txt

# Add line numbers
awk '{ print NR, $0 }' file.txt

# Remove duplicates (keep first)
awk '!seen[$0]++' file.txt

# Replace field separator
awk 'BEGIN { OFS="\t" } { print $1, $2, $3 }' file.txt

# Convert to uppercase
awk '{ print toupper($0) }' file.txt

CSV Processing

# Parse CSV
awk -F, '{ print $1, $3 }' data.csv

# CSV to TSV
awk -F, 'BEGIN { OFS="\t" } { print $1, $2, $3 }' data.csv

# Remove quotes
awk -F, '{ gsub(/"/, ""); print }' data.csv

# Add quotes
awk -F, '{
  for (i=1; i<=NF; i++)
    printf "\"%s\"%s", $i, (i<NF ? "," : "\n")
}' data.csv

Log Processing

# Count by status code
awk '{ count[$9]++ }
     END { for (c in count) print c, count[c] }' access.log

# Filter by date
awk '$4 ~ /06\/Feb\/2026/' access.log

# Sum response sizes
awk '{ sum += $10 } END { print sum }' access.log

# Top IP addresses
awk '{ ip[$1]++ }
     END { for (i in ip) print ip[i], i }' access.log |
     sort -rn | head -10

Advanced Topics

Multiple Files

# Process multiple files
awk '{ print FILENAME, FNR, $0 }' file1.txt file2.txt

# Different action per file
awk 'FNR==1 { print "File:", FILENAME }
     { print $0 }' *.txt

# Merge files by key
awk 'NR==FNR { a[$1]=$2; next }
     { print $0, a[$1] }' lookup.txt data.txt

User-Defined Functions

# Define function
awk '
function max(a, b) {
  return (a > b) ? a : b
}
{
  print max($1, $2)
}' data.txt

# Function with local variables
awk '
function factorial(n,    i, result) {
  result = 1
  for (i=2; i<=n; i++)
    result *= i
  return result
}
{
  print factorial($1)
}' numbers.txt

External Commands

# Execute system command
awk '{
  system("echo Processing: " $1)
}' file.txt

# Pipe to command
awk '{
  print $0 | "sort -r"
}' file.txt

# Get command output
awk 'BEGIN {
  "date" | getline today
  print "Today is", today
}'

Field Manipulation

# Add fields
awk '{ $(NF+1) = "new" } { print }' file.txt

# Remove fields
awk '{ $3 = ""; print }' file.txt

# Reorder fields
awk '{ temp=$1; $1=$2; $2=temp } { print }' file.txt

# Modify field
awk '{ $2 = toupper($2) } { print }' file.txt

Operators

Arithmetic Operators

Operator Description Example
+ Addition $1 + $2
- Subtraction $1 - $2
* Multiplication $1 * $2
/ Division $1 / $2
% Modulo $1 % $2
^ Exponentiation $1 ^ 2
++ Increment count++
-- Decrement count--
# Calculate
awk '{ print $1 + $2 }' data.txt

# Percentage
awk '{ print ($1/$2)*100 "%" }' data.txt

# Compound assignment
awk '{ sum += $1 } END { print sum }' data.txt

Logical Operators

Operator Description
&& Logical AND
|| Logical OR
! Logical NOT
# AND condition
awk '$1 > 10 && $2 < 100' data.txt

# OR condition
awk '$1 == "error" || $1 == "warning"' log.txt

# NOT condition
awk '!($1 ~ /test/)' file.txt

Assignment Operators

Operator Description
= Assign
+= Add and assign
-= Subtract and assign
*= Multiply and assign
/= Divide and assign
%= Modulo and assign
^= Exponent and assign
# Compound assignment
awk '{ sum += $1; count++ }
     END { print sum/count }' data.txt

Gotchas

Common Mistakes

Uninitialized Variables

# Wrong: prints nothing for first line
awk '{ print sum; sum += $1 }' data.txt

# Correct: initialize in BEGIN
awk 'BEGIN { sum=0 } { sum += $1; print sum }' data.txt

Field Modification Side Effects

# Modifying any field rebuilds $0
awk '{ $2 = toupper($2); print }' file.txt
# This also reformats spacing!

# To preserve formatting
awk '{
  original = $0
  $2 = toupper($2)
  print $0
}' file.txt

Floating Point Comparison

# Wrong: floating point comparison
awk '$3 == 0.1' data.txt

# Correct: use threshold
awk 'function abs(x){return x<0?-x:x}
     abs($3 - 0.1) < 0.0001' data.txt

Performance Tips

# Slow: reading file multiple times
awk '{ getline x < "lookup.txt"; print x, $0 }' data.txt

# Fast: read lookup file once
awk 'NR==FNR { lookup[FNR]=$0; next }
     { print lookup[FNR], $0 }' lookup.txt data.txt

# Avoid unnecessary regex
# Slow
awk '$0 ~ /pattern/' file.txt
# Fast
awk '/pattern/' file.txt

Portability

# GNU awk specific (won't work in POSIX awk)
awk '{ print tolower($0) }' file.txt  # tolower is POSIX
awk 'BEGIN { IGNORECASE=1 }' file.txt  # IGNORECASE is GNU

# POSIX portable
awk '{ print tolower($0) }' file.txt

# Use explicit path for reproducibility
/usr/bin/awk  # POSIX awk
/usr/bin/gawk # GNU awk

AWK vs Others

AWK vs sed

# AWK: field-oriented
awk '{ print $1, $3 }' file.txt

# sed: line-oriented
sed 's/old/new/g' file.txt

# Use AWK for:
# - Column/field manipulation
# - Calculations
# - Complex conditions

# Use sed for:
# - Simple substitutions
# - Line deletions
# - Stream editing

AWK vs grep

# grep: pattern matching only
grep 'error' log.txt

# AWK: pattern matching + processing
awk '/error/ { count++ } END { print count }' log.txt

# Use grep for:
# - Simple pattern matching
# - Binary file search
# - Recursive search

# Use AWK for:
# - Counting, summing
# - Field extraction
# - Formatted output

AWK vs cut/paste

# cut: fixed fields only
cut -d: -f1,3 /etc/passwd

# AWK: flexible field handling
awk -F: '{ print $1, $3 }' /etc/passwd

# AWK advantages:
# - Multiple delimiters
# - Computed fields
# - Conditional output

Also see