Skip to content

Conversation

ankitsmt211
Copy link
Member

@ankitsmt211 ankitsmt211 commented Dec 19, 2023

resolves #920

  • sets context for responses
  • attempts to reduce char limit, but hard to be consistent or verify it.
  • removes tag builder that is added to question builder

Note: There's no way to generate shorter responses, i could go down to really low using BRIEF as a keyword but that's very very short. Imo char limit shouldn't be priority. We can always paginate the response in embeds.

If we cross limit of 2k chars atm, AIResponseParser class will automatically cut it into multiple short messages as mentioned in #928 .

Reducing MAX_TOKEN would just lead to lost responses at times.

Bottom Line, when implementing "embeds" for rare responses that go over 4k limit we can either paginate or using gpt again on generated response by dropping more fillers or something otherwise most responses should very well fall under 4k limit.

* removing logic that prepends all applied tags to question builder
* passing first tag as context to gptservice
* setting context before sending the question
@ankitsmt211 ankitsmt211 added enhancement New feature or request priority: major labels Dec 19, 2023
@ankitsmt211 ankitsmt211 self-assigned this Dec 19, 2023
@ankitsmt211 ankitsmt211 requested review from a team as code owners December 19, 2023 11:24
@marko-radosavljevic
Copy link
Contributor

Yeah, we don't want to restrict and limit model from every side, rendering it uesless.To starve gpt of oxygen, until it coughs up few sentences for us, and dies. We want to just gently steer it, in it's full power.

If it has a perfect long gude, that explains every step perfectly, with code examples.. that's aweosme!

We should obvously optimize it, some simplest answers do not require those bloated responses. But quality should be our priority, and then optimizing for UI./UX.

These were the tests I used to benchmark and optimize responses.

class ChatGptServiceTest {
    private static final Logger logger = LoggerFactory.getLogger(ChatGptServiceTest.class);
    private Config config;
    private ChatGptService chatGptService;

    @BeforeEach
    void setUp() {
        config = mock();
        when(config.getOpenaiApiKey()).thenReturn("your-api-key");
        chatGptService = new ChatGptService(config);
    }

    @Test
    void askToGenerateLongPoem() {
        Optional<String> response = chatGptService.ask("generate a very long poem");
        response.ifPresent(e -> logger.warn(e));
    }
    
    @Test
    void askHowToSetupJacksonLibraryWithExamples() {
        Optional<String> response = chatGptService.ask("How to setup Jackson library with examples");
        response.ifPresent(e -> logger.warn(e));
    }
    
    @Test
    void askDockerReverseProxyWithNginxGuide() {
        Optional<String> response = chatGptService.ask("Docker reverse proxy with nginx guide");
        response.ifPresent(e -> logger.warn(e));
    }
    
    @Test
    void askWhyDoesItTakeYouMoreThan10SeconsToAnswer() {
        Optional<String> response = chatGptService.ask("Working example of Command pattern in java, with all the classes required, explained in detail. Bonus points for UML diagrams.");
        response.ifPresent(e -> logger.warn(e));
    }
}

Can you run these, and post how long they took, and results. Just curious how it would all look with current UI/UX. (Since this is testing service directly, best to just ask bot these questions).
Also curious if user would have to wait 2 minutes for an answer, and if that would look unintuitve/unfriendly for the user, because it's nor poperly communicated what is happening.

@marko-radosavljevic
Copy link
Contributor

marko-radosavljevic commented Dec 21, 2023

Regarding added context based on #question channel, so gpt knows it's java. I'm curious if it would backfire in other categories (for whatever reason), especially in other category.

Because of that 'on a Java Q&A discord server', what happens if someone asks question and writes 'answer in python'. Or what if question is obviusly python, because there is python code attached, and gpt tries to rewrite it as Java or bastardizes it. What if mentioned libraries and frameworks are clearly from python ecosystem, would it answer within that context, or it will try to Javthon it?

Make sure to test some edge cases in different categories, and use some previous real-world failulres from #questions in your testsuite. Also include some successful answers by gpt, to check if you can notice any regressions. Just to be sure that this new prompt is objectively better, and that it won't make some other aspects worse by accidenet. ☺️

@ankitsmt211
Copy link
Member Author

Can be improved with regards to length with a good prompt but it's not super consistent, will get back to this.

Zabuzard
Zabuzard previously approved these changes Jan 3, 2024
@ankitsmt211
Copy link
Member Author

The response is not really compact, it needs a bit of playing with different prompts length, don't really feel like doing that atm. I'm going to undo length related changes, only keep context related changes. Because earlier length seems to better than what i did here.

@ankitsmt211
Copy link
Member Author

ankitsmt211 commented Jan 9, 2024

These tests are not done more than a couple times, but seems to be relatively much better than original one.

Character count based on tests given by marko

with new prompt (3k token limit)

  1. 1375 chars (poem about java)
  2. 1782 chars
  3. 1595 chars
  4. 2032 chars

with new prompt plus changes(temperature) from @surajkumar (2k token limit)

  1. 1425 chars
  2. 1891 chars
  3. 1658 chars
  4. 1924 chars

with new prompt plus changes(temperature) from @surajkumar (3k token limit)

  1. 1323 chars
  2. 1622 chars
  3. 1841 chars
  4. 2134 chars

shorter responses and context is pretty solid.

with earlier one(3k token limit)

  1. 1687 chars (random poem)
  2. kept throwing error (response greater than 2k, which will try and split it)
  3. 1772 chars
  4. kept throwing error (response greater than 2k, which will try and split it)

relatively longer responses and context totally depends on user's question.

@ankitsmt211 ankitsmt211 requested a review from Zabuzard January 9, 2024 05:07
@surajkumar
Copy link
Contributor

surajkumar commented Jan 9, 2024

Can you add this to your PR please:

    /** The maximum number of tokens allowed for the generated answer */
    private static final int MAX_TOKENS = 2_000;

    /**
     * This parameter reduces the likelihood of the AI repeating itself. A higher frequency penalty
     * makes the model less likely to repeat the same lines verbatim. It helps in generating more
     * diverse and varied responses.
     */
    private static final double FREQUENCY_PENALTY = 0.5;

    /**
     * This parameter controls the randomness of the AI's responses. A higher temperature results in
     * more varied, unpredictable, and creative responses. Conversely, a lower temperature makes the
     * model's responses more deterministic and conservative.
     */
    private static final double TEMPERATURE = 0.8;

    /**
     * n: This parameter specifies the number of responses to generate for each prompt. If n is more
     * than 1, the AI will generate multiple different responses to the same prompt, each one being
     * a separate iteration based on the input.
     */
    private static final int MAX_NUMBER_OF_RESPONSES = 1;

These keen eyes will notices some changes to the values.

@ankitsmt211
Copy link
Member Author

Token, freq, temperature are already set in the code. Do you want me to give them seperate var names?

@surajkumar
Copy link
Contributor

surajkumar commented Jan 9, 2024

Token, freq, temperature are already set in the code. Do you want me to give them seperate var names?

Yeah only because there's no Java docs on the openai lib and looking them up is a bother imo. Also doing this removes the whole "magic number" aspect but more so it's for the docs. I was gonna do it in another PR but since you're already here...

I also upped the TEMPERATURE I think that might be interesting.

@marko-radosavljevic
Copy link
Contributor

Merging on basis of 1 review in approval of the changes, and more than 7 days of inactivity afterwards. Thanks ❤️

@marko-radosavljevic marko-radosavljevic merged commit 3334ba2 into Together-Java:develop Feb 23, 2024
Taz03 pushed a commit that referenced this pull request Mar 6, 2024
* refactor question builder for gpt feature
* removing logic that prepends all applied tags to question builder
* passing first tag as context to gptservice
* setting context before sending the question

* refactoring setup message

* improving context

* refactoring context for more appropriate responses

* get matching tag or default for context

* sending instructions along with question, instead of setup

* prompt for shorter responses

* values responsible for tweaking AI responses are not declared as constants, docs are also added for such values
Taz03 pushed a commit that referenced this pull request Mar 13, 2024
* refactor question builder for gpt feature
* removing logic that prepends all applied tags to question builder
* passing first tag as context to gptservice
* setting context before sending the question

* refactoring setup message

* improving context

* refactoring context for more appropriate responses

* get matching tag or default for context

* sending instructions along with question, instead of setup

* prompt for shorter responses

* values responsible for tweaking AI responses are not declared as constants, docs are also added for such values
@ankitsmt211 ankitsmt211 mentioned this pull request Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority: major
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ChatGPT Auto-Answer should be more compact and use Java
4 participants