Skip to content

uutils AWK progress report #16

@Alonely0

Description

@Alonely0

I will be updating this issue as we make progress. Please feel free to take down good first issues.

TODO

  • Lexer: Numeric escapings \u, \x. Good first issue.
  • Parser: Extend spans during Pratt parsing for better error messages (trivial-ish?).
  • Parser: The preprocessor is TBD (not complicated, but will tangle up pretty printing).
  • Parser: It would be nice to reduce LOC.
  • Parser: Start running gawk parsing tests at some point (especially when we get a basic interpreter and nail down --pretty-print).
  • Interpreter: We are looking forward to building a basic tree-walking interpreter to get integration testing going, as well as a baseline for future iterations. Ideally, these should be a bytecode machine or a JIT. The design sketch is for it to be a cooperative I/O machine, probably built with smol; if we want to better support AWK's long-forgotten number-crunching intent, we could easily extend this to parallel computations.
  • Lexer & parser: We must add built-in functions as standalone tokens, as well as any missing variables. Use the POSIX utility functions where needed. Good first issue.
  • Lexer: Add more tests. Good first issue.
  • Parser: Add smell tests. Good first issue.
  • Parser: Fuzzing.

Known issues

Low priority

  • Lexer: (arguably) incorrectly parses concatenated regexes like print a /x/. Note that gawk is probably the only awk implementation that parses this as such, and mawk & bwk (the one true awk) bail on it afaik. Adding another state to the lexer context might be enough or go a long way. I can sleep at night with this being a wontfix tho.
  • Parser: It does some expression lowering that trips --pretty-printing, like identifier qualification and indexing desugaring. The ident qualification is easy to undo with some pointer comparisons, but we need to record interactions with @namespace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions