Bug/gpt response compact #994

ankitsmt211 · 2023-12-19T11:24:04Z

resolves #920

sets context for responses
attempts to reduce char limit, but hard to be consistent or verify it.
removes tag builder that is added to question builder

Note: There's no way to generate shorter responses, i could go down to really low using BRIEF as a keyword but that's very very short. Imo char limit shouldn't be priority. We can always paginate the response in embeds.

If we cross limit of 2k chars atm, AIResponseParser class will automatically cut it into multiple short messages as mentioned in #928 .

Reducing MAX_TOKEN would just lead to lost responses at times.

Bottom Line, when implementing "embeds" for rare responses that go over 4k limit we can either paginate or using gpt again on generated response by dropping more fillers or something otherwise most responses should very well fall under 4k limit.

* removing logic that prepends all applied tags to question builder * passing first tag as context to gptservice * setting context before sending the question

marko-radosavljevic · 2023-12-20T19:52:14Z

Yeah, we don't want to restrict and limit model from every side, rendering it uesless.To starve gpt of oxygen, until it coughs up few sentences for us, and dies. We want to just gently steer it, in it's full power.

If it has a perfect long gude, that explains every step perfectly, with code examples.. that's aweosme!

We should obvously optimize it, some simplest answers do not require those bloated responses. But quality should be our priority, and then optimizing for UI./UX.

These were the tests I used to benchmark and optimize responses.

class ChatGptServiceTest {
    private static final Logger logger = LoggerFactory.getLogger(ChatGptServiceTest.class);
    private Config config;
    private ChatGptService chatGptService;

    @BeforeEach
    void setUp() {
        config = mock();
        when(config.getOpenaiApiKey()).thenReturn("your-api-key");
        chatGptService = new ChatGptService(config);
    }

    @Test
    void askToGenerateLongPoem() {
        Optional<String> response = chatGptService.ask("generate a very long poem");
        response.ifPresent(e -> logger.warn(e));
    }
    
    @Test
    void askHowToSetupJacksonLibraryWithExamples() {
        Optional<String> response = chatGptService.ask("How to setup Jackson library with examples");
        response.ifPresent(e -> logger.warn(e));
    }
    
    @Test
    void askDockerReverseProxyWithNginxGuide() {
        Optional<String> response = chatGptService.ask("Docker reverse proxy with nginx guide");
        response.ifPresent(e -> logger.warn(e));
    }
    
    @Test
    void askWhyDoesItTakeYouMoreThan10SeconsToAnswer() {
        Optional<String> response = chatGptService.ask("Working example of Command pattern in java, with all the classes required, explained in detail. Bonus points for UML diagrams.");
        response.ifPresent(e -> logger.warn(e));
    }
}

Can you run these, and post how long they took, and results. Just curious how it would all look with current UI/UX. (Since this is testing service directly, best to just ask bot these questions).
Also curious if user would have to wait 2 minutes for an answer, and if that would look unintuitve/unfriendly for the user, because it's nor poperly communicated what is happening.

marko-radosavljevic · 2023-12-21T20:16:44Z

Regarding added context based on #question channel, so gpt knows it's java. I'm curious if it would backfire in other categories (for whatever reason), especially in other category.

Because of that 'on a Java Q&A discord server', what happens if someone asks question and writes 'answer in python'. Or what if question is obviusly python, because there is python code attached, and gpt tries to rewrite it as Java or bastardizes it. What if mentioned libraries and frameworks are clearly from python ecosystem, would it answer within that context, or it will try to Javthon it?

Make sure to test some edge cases in different categories, and use some previous real-world failulres from #questions in your testsuite. Also include some successful answers by gpt, to check if you can notice any regressions. Just to be sure that this new prompt is objectively better, and that it won't make some other aspects worse by accidenet. ☺️

ankitsmt211 · 2023-12-21T23:06:25Z

Can be improved with regards to length with a good prompt but it's not super consistent, will get back to this.

ankitsmt211 · 2024-01-03T23:56:56Z

The response is not really compact, it needs a bit of playing with different prompts length, don't really feel like doing that atm. I'm going to undo length related changes, only keep context related changes. Because earlier length seems to better than what i did here.

ankitsmt211 · 2024-01-09T05:05:45Z

These tests are not done more than a couple times, but seems to be relatively much better than original one.

Character count based on tests given by marko

with new prompt (3k token limit)

1375 chars (poem about java)
1782 chars
1595 chars
2032 chars

with new prompt plus changes(temperature) from @surajkumar (2k token limit)

1425 chars
1891 chars
1658 chars
1924 chars

with new prompt plus changes(temperature) from @surajkumar (3k token limit)

1323 chars
1622 chars
1841 chars
2134 chars

shorter responses and context is pretty solid.

with earlier one(3k token limit)

1687 chars (random poem)
kept throwing error (response greater than 2k, which will try and split it)
1772 chars
kept throwing error (response greater than 2k, which will try and split it)

relatively longer responses and context totally depends on user's question.

surajkumar · 2024-01-09T11:07:07Z

Can you add this to your PR please:

    /** The maximum number of tokens allowed for the generated answer */
    private static final int MAX_TOKENS = 2_000;

    /**
     * This parameter reduces the likelihood of the AI repeating itself. A higher frequency penalty
     * makes the model less likely to repeat the same lines verbatim. It helps in generating more
     * diverse and varied responses.
     */
    private static final double FREQUENCY_PENALTY = 0.5;

    /**
     * This parameter controls the randomness of the AI's responses. A higher temperature results in
     * more varied, unpredictable, and creative responses. Conversely, a lower temperature makes the
     * model's responses more deterministic and conservative.
     */
    private static final double TEMPERATURE = 0.8;

    /**
     * n: This parameter specifies the number of responses to generate for each prompt. If n is more
     * than 1, the AI will generate multiple different responses to the same prompt, each one being
     * a separate iteration based on the input.
     */
    private static final int MAX_NUMBER_OF_RESPONSES = 1;

These keen eyes will notices some changes to the values.

ankitsmt211 · 2024-01-09T14:15:08Z

Token, freq, temperature are already set in the code. Do you want me to give them seperate var names?

surajkumar · 2024-01-09T18:17:42Z

Token, freq, temperature are already set in the code. Do you want me to give them seperate var names?

Yeah only because there's no Java docs on the openai lib and looking them up is a bother imo. Also doing this removes the whole "magic number" aspect but more so it's for the docs. I was gonna do it in another PR but since you're already here...

I also upped the TEMPERATURE I think that might be interesting.

…tants, docs are also added for such values

marko-radosavljevic · 2024-02-23T19:52:15Z

Merging on basis of 1 review in approval of the changes, and more than 7 days of inactivity afterwards. Thanks ❤️

* refactor question builder for gpt feature * removing logic that prepends all applied tags to question builder * passing first tag as context to gptservice * setting context before sending the question * refactoring setup message * improving context * refactoring context for more appropriate responses * get matching tag or default for context * sending instructions along with question, instead of setup * prompt for shorter responses * values responsible for tweaking AI responses are not declared as constants, docs are also added for such values

ankitsmt211 added 5 commits December 19, 2023 15:45

refactor question builder for gpt feature

5f3eef7

* removing logic that prepends all applied tags to question builder * passing first tag as context to gptservice * setting context before sending the question

refactoring setup message

87154a1

improving context

1cd7c95

refactoring context for more appropriate responses

24f9ce7

get matching tag or default for context

fc55bb9

ankitsmt211 added enhancement New feature or request priority: major labels Dec 19, 2023

ankitsmt211 self-assigned this Dec 19, 2023

ankitsmt211 requested review from a team as code owners December 19, 2023 11:24

sending instructions along with question, instead of setup

d3e8376

Zabuzard previously approved these changes Jan 3, 2024

View reviewed changes

prompt for shorter responses

be50081

ankitsmt211 dismissed Zabuzard’s stale review via be50081 January 9, 2024 04:21

ankitsmt211 requested a review from Zabuzard January 9, 2024 05:07

Merge branch 'develop' into bug/gpt-response-compact

74d7be2

values responsible for tweaking AI responses are not declared as cons…

3f4901b

…tants, docs are also added for such values

marko-radosavljevic approved these changes Feb 15, 2024

View reviewed changes

marko-radosavljevic merged commit 3334ba2 into Together-Java:develop Feb 23, 2024

ankitsmt211 mentioned this pull request Mar 11, 2024

ChatGpt respons favors python over java #900

Closed

ankitsmt211 mentioned this pull request Mar 19, 2024

Release v4.9 #1060

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bug/gpt response compact #994

Bug/gpt response compact #994

Uh oh!

ankitsmt211 commented Dec 19, 2023 •

edited

Loading

Uh oh!

marko-radosavljevic commented Dec 20, 2023

Uh oh!

marko-radosavljevic commented Dec 21, 2023 •

edited

Loading

Uh oh!

ankitsmt211 commented Dec 21, 2023

Uh oh!

ankitsmt211 commented Jan 3, 2024

Uh oh!

ankitsmt211 commented Jan 9, 2024 •

edited

Loading

Uh oh!

surajkumar commented Jan 9, 2024 •

edited

Loading

Uh oh!

ankitsmt211 commented Jan 9, 2024

Uh oh!

surajkumar commented Jan 9, 2024 •

edited

Loading

Uh oh!

marko-radosavljevic commented Feb 23, 2024

Uh oh!

Uh oh!

Uh oh!

Bug/gpt response compact #994

Bug/gpt response compact #994

Uh oh!

Conversation

ankitsmt211 commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marko-radosavljevic commented Dec 20, 2023

Uh oh!

marko-radosavljevic commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ankitsmt211 commented Dec 21, 2023

Uh oh!

ankitsmt211 commented Jan 3, 2024

Uh oh!

ankitsmt211 commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

surajkumar commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ankitsmt211 commented Jan 9, 2024

Uh oh!

surajkumar commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marko-radosavljevic commented Feb 23, 2024

Uh oh!

Uh oh!

ankitsmt211 commented Dec 19, 2023 •

edited

Loading

marko-radosavljevic commented Dec 21, 2023 •

edited

Loading

ankitsmt211 commented Jan 9, 2024 •

edited

Loading

surajkumar commented Jan 9, 2024 •

edited

Loading

surajkumar commented Jan 9, 2024 •

edited

Loading