Semgrep Notes Intro

These are some of my Semgrep notes gathered while going through the tutorial as well as running in operations for future reference.

Semgrep

Semgrep is a powerful static analysis tool designed to scan code for patterns that may indicate potential vulnerabilities.

  • It’s fairly easy to use and get up to speed with.
  • It’s an excellent tool to integrate into your SAST process for AppSec engagements.

Rule Syntax

Docs: https://semgrep.dev/docs/writing-rules/overview

[!INFO] Semgrep uses YAML syntax.
The pipe (|) after pattern-inside is YAML syntax that permits multi-line strings.

ellipsis operator ()

- Skip over

Allows you to skip over stuff you don’t care about, similar to * star in RegEx.

func(get_user(...) + "is my userid")
  • Example code match:
    func(get_user(1) + "is my userid")
    

- Constant strings

Inside quotation marks, "..." , Semgrep matches any constant string.

# Rule
func(arg, "...")
  • Example code match:
    # Match
    func(arg, "my string")
    

- Ignore in-between code

It can be used to ignore in between code (expressions or statements). Example:

function_A(param)
...
function_Z(param)

- Ordered lists

The ellipses can be used to match a pattern in an ordered list such as method/function arguments.

  • It can be used to find the first expression is a list like: open("r", ...)
  • It can be used to find the last expression is a list like: open(..., "r")
  • It can be used to find an expression anywhere in a list by using ellipses on both sided like: open(..., "r", ...)

Metavariables

Metavariables allow you to match something in case you don’t know exactly what to match. Similar to capture groups in RegEx.

  • Must start with $ dollar sign
  • Can only contain:
    • Uppercase chars
    • Digits
    • Underscores

Metavariables Matching a function

Metavariables can be used to match a function:

  • In this example “$FUNC” will match any function name.
    def $FUNC(...):
      ...
    
  • In this example it will match any function “$FUNC”, in which it’s body is making a function call to requests.
    def $FUNC(...):
        ...
        requests.$METHOD(...)
    

Metavariables matching regular variables

Metavariables can be used to match regular variables as well.

  • Example of catching a bug where the file is open for read and attempting to write to it:
    $FD = open($FILENAME, 'r', ...)
    ...
    $FD.write(...)
    

Metavariables vs Ellipses

[!QUOTE] Using a metavariable tells Semgrep, “something is here, but I don’t know what it is.” Using an ellipsis tells Semgrep, “I don’t care what is between here and there.”


🔥 Tip for Starting a Rule

If you are having trouble matching a pattern starting from scratch, simply:

  1. Copy and paste the original code as a pattern
  2. Generalize pattern by using ellipses for non-important lines and where other code could be replaced
  3. Replace variables with metavariables.

Pattern Composition

[!IMPORTANT] Pattern-Examples

1. Either / Or (Semgrep Tutorial)

If you want to match either pattern1 OR pattern2, use pattern-either.

rules:
  - id: use-string-equals
    message: In Java, do not use == with strings. Use String.equals() instead.
    pattern-either:
      - pattern: if ($X == "...") ...
      - pattern: if ("..." == $X) ...

Example Code that matches this rule:

public class Example {
  public int foo(String a, int b) {
      if (a == "hello") return 1;
      // Match here too by adding another pattern clause.
      if ("hello" == a) return 2;
      // Do not match here
      print("hello")
  }
}

2. Pattern is NOT (Semgrep Tutorial)

You can use pattern-not to filter out patterns you do not want to match.

rules:
  - id: subprocess-call
    patterns:
    - pattern: subprocess.call(...)
    # This says never match if first argument is a string
    - pattern-not: subprocess.call("...", ...)

Example Code that matches this rule:

import subprocess

subprocess.call("ls -a .")  # Try not to match here.

dir = "/tmp"
subprocess.call("ls -a " + dir) # or here

subprocess.call(dir, shell=True) # or here!

subprocess.call(nonstring) # MATCH THIS

subprocess.call(nonstring, shell=True) # and this!

3. Pattern is inside - (Semgrep Tutorial)

As the name implies, pattern-inside lets you search for patterns inside the pattern specified by pattern-inside.

rules:
  - id: Go_http-responsewriter-write
    patterns:
    - pattern-inside: |
        func $FUNC(...) {
          ...
        }
    - pattern: $WRITER.Write(...)

Example Go Code that matches this rule:

package main

import (
  "bytes"
  "fmt"
  "log"
  "net/http"
)

func writeMessage(msg string, buf *bytes.Buffer) {
  // Try not to match here by using pattern-inside.
  buf.Write([]byte(msg))
}

func errorPage(hw http.ResponseWriter, r *http.Request) {
  hw.WriteHeader(http.StatusAccepted)
  hw.Write([]byte("Error!"))
}

func indexPage(w http.ResponseWriter, r *http.Request) {
  const template = `
  <html>
  <body>
    <h1>Greetings!</h1>
    <h2>%s</h2>
  </body>
  </html>`

  w.WriteHeader(http.StatusAccepted)
  w.Write([]byte(fmt.Sprintf(template, "Semgrep!")))
}

func main() {
  http.HandleFunc("/", indexPage)
  http.HandleFunc("/error", errorPage)
  http.ListenAndServe(":8080", nil)
}

4. Pattern is NOT inside (Semgrep Tutorial)

pattern-not-inside filters out any matches inside the range defined by the pattern.

[!NOTE] Both of these rules below match the pattern

Rule 1

rules:
  - id: secure-flag-not-set
    patterns:
    - pattern: $RESPONSE.addCookie($COOKIE);
    - pattern-not-inside: |
        cookie.setSecure(true);
        ...

Rule 2

rules:
  - id: secure-flag-not-set
    patterns:
    - pattern: $RESPONSE.addCookie($COOKIE);
    - pattern-not-inside: |
        $COOKIE.setSecure(true);
        ...

Example Code that matches this rule:

@Controller
public class CookieController {

    @RequestMapping(value = "/cookie1", method = "GET")
    public void setCookie(@RequestParam String value, HttpServletResponse response) {
        Cookie cookie = new Cookie("cookie", value);
        response.addCookie(cookie);
    }

    @RequestMapping(value = "/cookie2", method = "GET")
    public void setSecureCookie(@RequestParam String value, HttpServletResponse response) {
        Cookie cookie = new Cookie("cookie", value);
        cookie.setSecure(true);
        String mystr = "this line should not interfere";
        response.addCookie(cookie); // Try not to match here.
    }
}

5. Metavariable Regex (Semgrep Tutorial)

One final Semgrep pattern type that is very useful is called metavariable-regex.
It allows you to specify that certain metavariables only match variables whose names fit a specified regular expression.

rules:
  - id: use-decimalfield-for-money
    patterns:
    - pattern-inside: |
        class $M(...):
          ...
    - pattern: $F = django.db.models.FloatField(...)
    - metavariable-regex:
        metavariable: '$F'
        regex: '.*(price|fee|salary).*'
    message: Found a FloatField used for variable $F. Use DecimalField for currency fields to avoid float-rounding errors.
    languages: [python]
    severity: ERROR

Example Code that matches this rule:

from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=64)
    description = models.TextField()
    # this is ok
    price = models.DecimalField(max_digits=6, decimal_places=2)
    # this is also ok
    return_rate = models.FloatField()
    # Semgrep finds this because old_fee ends in the word fee, which is in the regex above
    old_fee = models.FloatField()
    # match this
    price_inc = models.FloatField()

Advanced

To catch a case where some how this $VAR variable is used. We can use <... $VAR ...>
RULE

resp.write(<... $VAR ...>);

This would match something all of these:

resp.write(resp);
resp.write('Response</br>' + resp);
resp.write('Response</br>' + resp + 'foo');

Running Semgrep

semgrep --config [path to directory w/ rules or a rule.yml] [directory with files or a file to scan]
  • No metrics, quiet (-q) output with only results, and output in JSON format
    semgrep --config=[path to directory w/ rules or a rule.yml] --metrics=off -q --json --output=myScanOutput.json [directory with files or a file to scan]
    
  • ---config auto uses Semgrep’ s built-in rules
    semgrep --config auto [directory to scan]
    
  • You can use multiple configs. Here I’m using built-in rules and my other rules in My-Rules directory
    semgrep --config auto --confg ./My-Rules [directory to scan]
    

Presentations

  • Semgrep: a lightweight static analysis tool for security consultant and hackers by TrailOfBits: https://youtu.be/O5mh8j7-An8?si=V2-Y9EdlkSgMvOAx