Add code format command #622

Zabuzard · 2022-10-09T07:35:55Z

Overview

Implements and closes #621 .

This PR does two things:

it adds code-commands, first one is format
it reworks the formatter project

Code commands

UX

Once it detects code, it sends a message with buttons:

If the button is clicked, it attaches the result as embed to the bot message and disables the button:

Additionally, it auto-updates itself on every update of the original message:

As well as auto-deletes if the original message is deleted.

Extension

The code was also written in a generic way, such that adding new code-actions is very simple. Here is an example with some mock commands:

All that has to be done is implementing a simple interface:

and adding it to a list in CodeMessageHandler.

Code detection

The feature activates on messages that contain code. Code is detected by the presence of code-blocks/fences.

This matching is done without regex to save crucial performance. The check is executed on each and every single posted message after all.

Formatter rework

No file was left untouched. The actual core logic and approach still is the same though. Just a major refactoring, documentation, unit tests, extra features and bug fixes.

The basic approach is:

tokenize the code into its tokens (Lexer)
put them pack together nicely formatted (CodeSectionFormatter)

The flow is available to the user via the class Formatter.

Lexer

The core of the lexer is the enum TokenType, which lists all recognized tokens.

Tokens are found by constantly matching the next token from the code. For example int x = 5; results in a list of

INT
WHITESPACE
IDENTIFIER
WHITESPACE
ASSIGN
WHITESPACE
NUMBER
SEMICOLON

The previous proof-of-concept version of the lexer had significant performance problems, which is why matching is now done very fast and performant by:

using a rolling-window/string-view instead of doing real substrings (CharBuffer)
avoiding regex for most of the token types

Unit tests ensure, now in a very elegant way, that no type hides another as prefix (for example : is a prefix of :: and hence latter has to be matched first).

Formatting

The actual formatting happens in CodeSectionFormatter. For most of the actual logic, it refers to FormatterRules.

Essentially, it iterates through all tokens and constructs back a string. Each time, it has to decide stuff like:

put a space before it?
put a space after it?
put a newline after it?
put indent before it?

It does so by maintaining some states, such as:

currentIndentLevel
currentGenericLevel
isStartOfLine
expectedSemicolonsInLine
isInPackageDeclaration
isInImportDeclaration

It is important that the states are kept rather slim, otherwise it would be too fragile for non-compiling/incorrect code.

The actual formatter rules can be a bit nasty, since there are so many edge cases. Keep in mind though, that the goal is not to create a 100% correct formatter. The formatter has to support incorrect code and must always yield at least okay-ish looking results.

There are lots of unit tests that cover a lot of cases and real code examples.

Checklist

Zabuzard · 2022-10-21T14:42:05Z

Sonar detected a code duplication with ScamBlocker on the fact how to implement UserInteractor without an adapter... nothing really we could do about that, its intentional.

Zabuzard · 2022-10-28T12:19:22Z

SonarCloud Quality Gate failed.

0 Bugs 0 Vulnerabilities 0 Security Hotspots 1 Code Smell

0.0% Coverage 1.3% Duplication

Sonar detected a code duplication with ScamBlocker on the fact how to implement UserInteractor without an adapter... nothing really we could do about that, its intentional.

Zabuzard · 2022-11-02T08:39:19Z

@Tais993 reminder :)

Tais993 · 2022-11-03T09:48:46Z

application/src/main/java/org/togetherjava/tjbot/commands/code/CodeMessageHandler.java

+     * The feature is secondary though, which is why its kept in RAM and not in the DB.
+     */
+    private final Cache<Long, Long> originalMessageToCodeReply =
+            Caffeine.newBuilder().maximumSize(10_000).build();


I notice a general trend of massive caches, but is this needed? I feel like this is an easy way for trollers to "crash" our bot, at least I'd assume 10k entries takes up a lot of ram.

@Tais993 10k should take almost no RAM at all, id estimate around 100 KB for a full cache. u need 1000 such caches to reach 1 GB RAM.

the point of the cache is exactly that, to prevent against RAM blowups. with a traditional Map, u have no proper limit and users are able to blow it up.

if u feel better, we can reduce it to maybe 2k? its not like it really matters for this feature anyways

done ✔️

reduced to 2k

Tais993 · 2022-11-03T09:51:20Z

application/src/main/java/org/togetherjava/tjbot/commands/code/CodeMessageHandler.java

+            @Nullable CodeAction disabledAction) {
+        return labelToCodeAction.values().stream().map(action -> {
+            Button button = createButtonForAction(action, originalMessageId);
+            return action == disabledAction ? button.asDisabled() : button;


So you can only disable 1 action?

the disabled action is the action that is currently active, yes.

like, if u have 3 buttons (format, run, bytecode) and u click on run, then u want run to be deactivated, cause it is currently active.

im going to rename the param to currentlyActiveAction

done ✔️

Tais993 · 2022-11-03T10:09:01Z

application/src/test/java/org/togetherjava/tjbot/commands/utils/MessageUtilsTest.java

    }
+
+    private static Stream<Arguments> provideExtractCodeTests() {
+        return Stream.of(createExtractCodeArgumentsFor("basic", """


All honesty unreadable, honestly, using a list and using separate List#add calls would be a lot more readable

i doubt it would be more readable after spotless. but sure, can give it a try

done ✔️

i think its actually more readable now, thanks

Tais993 · 2022-11-03T10:09:52Z

formatter/src/main/java/org/togetherjava/tjbot/formatter/Formatter.java

-     * Indexes tokens to contain information about whether they are code tokens or not
+     * Formats the given string.
+     * <p>
+     * Best results are achieved for Java code.


"best results", that sounds like this works for JS, but not as good.

Instead just say "Only works with Java"

"best results", that sounds like this works for JS, but not as good.

but it does. and its expected to be used for that as well.

right now, the format action is available for all languages and yields okayish results for everything i tested (except languages without semicolons)

Tais993 · 2022-11-03T10:14:02Z

formatter/src/main/java/org/togetherjava/tjbot/formatter/formatting/TokenQueue.java

+import java.util.stream.Stream;
+
+/**
+ * Queue that holds tokens to be consumed.


The term "token" is very often used, but I haven't noticed much of a description.

I'd heavily appreciate it if you link to the Token class

thats fair. i guess it was clear due to the context of lexxing (= tokenization).

done ✔️

added some extra paragraph and links

Tais993 · 2022-11-03T10:46:48Z

formatter/src/main/java/org/togetherjava/tjbot/formatter/tokenizer/TokenType.java

+    DOT("."),
    SEMICOLON(";"),
    METHOD_REFERENCE("::"),
-    COLON(":", false, true), // technically not a "real" operator but used in an enhanced for loop


Not needed anymore?

it is needed, so that the lexxer recognizes it as individual token.

its still part of the list somewhere, i reordered a lot of stuff

found it :)

sonarqubecloud · 2022-11-05T08:53:19Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

0.0% Coverage
1.2% Duplication

Zabuzard · 2022-11-08T07:47:51Z

Going to merge this now, its quite old already and I do not want to hold back the feature for much longer, just bc people dont have time to review. There dont seem to be any red flags, so you can continue with the CR and Ill do proposed changes in a follow-up PR instead.

Zabuzard added new command Add a new command or group of commands to the bot priority: major labels Oct 9, 2022

Zabuzard self-assigned this Oct 9, 2022

Zabuzard force-pushed the feature/add_format_command branch 3 times, most recently from 6f2019a to 1065d15 Compare October 20, 2022 10:05

Zabuzard force-pushed the feature/add_format_command branch 2 times, most recently from 8052a5e to 968bcc1 Compare October 27, 2022 07:36

Zabuzard marked this pull request as ready for review October 28, 2022 12:08

Zabuzard requested review from a team as code owners October 28, 2022 12:08

Tais993 self-requested a review October 28, 2022 12:42

Zabuzard force-pushed the feature/add_format_command branch from b491bb9 to 7889183 Compare November 3, 2022 07:14

Tais993 requested changes Nov 3, 2022

View reviewed changes

Zabuzard requested a review from Tais993 November 5, 2022 08:45

Zabuzard added 12 commits November 5, 2022 09:49

First draft of format command

4db4d2d

adding onMessageDeleted to message receiver

6fa8d5a

some stuff

09eddf5

moved code formatting responsibility over to the message handler

6834a02

Draft of final UX

d2f0e3a

Generify, support multiple commands

fbe65fc

signature after rebase, ignore bots

2cb3014

some logging

aaaf3f4

Caffeine cache

84a4dbd

cant do much about that duplication

afd176c

Improved code extraction, polished design, javadoc

ba3b420

fixed bug where newlines are not matched

3a8dc08

Zabuzard added 19 commits November 5, 2022 09:50

Removed code-section feature (not needed)

1297aa4

Improved tokens

2bfbd41

Got rid of line-wise lexing, improved patterns

d669ea4

Improved matching to not solely rely on regex

d0e2d51

Improved lexer to use rolling window on string view

8b1ca42

Improved formatter interface

9e0e5ee

Improved actual formatter engine, rules and queue

a524da6

patch multi line comments

bef9dbb

Adjusted tokenqueue to actual needs

bd60354

javadoc

1b57ea8

extract code tests, got rid of regex

3a9ca14

Expanded list of tokens

72545b9

unit tests for matching

b29b44a

Lexer tests

b342edd

unit tests for tokenqueue

472ee55

formatting tests and bugfixes

a4756fe

got rid of example duplication

e1eb5b4

Added java as default language to always have syntax highlighting

4387654

CR Tais

9fd8e1a

Zabuzard force-pushed the feature/add_format_command branch from 888e183 to 9fd8e1a Compare November 5, 2022 08:50

Zabuzard merged commit cefe923 into develop Nov 8, 2022

Zabuzard deleted the feature/add_format_command branch November 8, 2022 07:50

Zabuzard mentioned this pull request Nov 8, 2022

Release v3.9 #680

Merged

Uh oh!

Add code format command #622

Add code format command #622

Uh oh!

Conversation

Zabuzard commented Oct 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Code commands

UX

Extension

Code detection

Formatter rework

Lexer

Formatting

Checklist

Uh oh!

Zabuzard commented Oct 21, 2022

Uh oh!

Zabuzard commented Oct 28, 2022

Uh oh!

Zabuzard commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zabuzard Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zabuzard Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Nov 5, 2022

Uh oh!

Zabuzard commented Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Zabuzard commented Oct 9, 2022 •

edited

Loading

Zabuzard commented Nov 2, 2022 •

edited

Loading

Zabuzard Nov 3, 2022 •

edited

Loading

Zabuzard Nov 3, 2022 •

edited

Loading