파이썬 튜토리얼

나의 첫 오픈소스 기여

DDD

Sep 19, 2024 am 10:17 AM

문제 제기

첫 번째 기여로 프롬프트 및 완료 생성에 사용되는 토큰을 표시하는 새 플래그 옵션을 추가하는 새로운 기능을 다른 프로젝트에 추가하는 문제를 제출했습니다.

특징: 채팅 완료 토큰 정보 플래그 옵션 #8

클레오벤트라 게시일: 2024년 9월 16일

설명

사용자에게 전송 및 수신된 토큰 수를 제공하는 플래그 옵션입니다. 채팅 완료 요청 시 토큰 예산 내에서 사용자가 머물도록 안내하는 중요한 기능이라고 생각합니다!

구현

이를 위해서는 -t 및 --token-usage가 될 수 있는 또 다른 옵션 플래그를 추가해야 합니다. 사용자가 명령에 이 플래그를 포함하면 완성 생성에 사용된 토큰 수와 프롬프트에 사용된 토큰 수를 명확하게 자세히 표시해야 합니다.

GitHub에서 보기

저는 fadingNA의 오픈 소스 프로젝트인 chat-minal에 기여하기로 결정했습니다. chat-minal은 Python으로 작성된 CLI 도구로 OpenAI를 활용하여 코드 검토 생성, 파일 변환, 마크다운 생성 등 다양한 작업을 수행할 수 있습니다. 텍스트, 텍스트 요약.

내 풀 요청

이전에 Python으로 코드를 작성해 본 적이 있지만 그것이 내 능력이 가장 뛰어난 것은 아닙니다. 따라서 이 프로젝트에 참여하는 것은 나에게 어렵지만 좋은 학습 경험을 제공합니다.
문제는 다른 사람의 코드를 읽고 이해해야 하며 코드의 디자인을 깨지 않는 방식으로 적절한 솔루션을 제공해야 한다는 것입니다. 코드를 크게 변경하지 않고도 효율적으로 기능을 추가하고 코드의 일관성을 유지하려면 흐름을 이해하는 것이 중요합니다.

FEAT: 토큰 사용 플래그 #9

클레오벤트라 게시일: 2024년 9월 16일

특징

사용자에 대해 --token_usage 플래그 옵션을 포함하는 기능을 추가했습니다. 이 옵션은 프롬프트 및 생성된 완료에 사용된 토큰 수에 대한 정보를 사용자에게 제공합니다.

구현

코드 설계를 바탕으로 제가 생각해낸 해결책은 token_usage 플래그가 있는지 확인하는 것입니다. token_usage 플래그가 사용되지 않은 경우 코드에서 불필요한 if 문을 확인하는 것을 원하지 않기 때문에 청크 내부에 Usage_metadata가 있는지 확인하는 차이점을 제외하고 두 개의 별도의 동일한 루프 논리를 만들었습니다.

if token_usage:
    for chunk in runnable.stream({"input_text": input_text}):
        print(chunk.content, end="", flush=True)
        answer.append(chunk.content)

        if chunk.usage_metadata:
            completion_tokens = chunk.usage_metadata.get('output_tokens')
            prompt_tokens = chunk.usage_metadata.get('input_tokens')
else:
    for chunk in runnable.stream({"input_text": input_text}):
        print(chunk.content, end="", flush=True)
        answer.append(chunk.content)

디스플레이

At the end of the execution of get_completions() method, a check for the flag token_usage is added, which then displays the token usage details to stderr if the flag was used.

if token_usage:
    logger.error(f"Tokens used for completion: <span class="pl-s1"><span class="pl-kos">{completion_tokens}</span>"</span>)
    logger.error(f"Tokens used for prompt: <span class="pl-s1"><span class="pl-kos">{prompt_tokens}</span>"</span>)

View on GitHub

My solution

Retrieving the token usage

if token_usage:
    for chunk in runnable.stream({"input_text": input_text}):
        print(chunk.content, end="", flush=True)
        answer.append(chunk.content)

        if chunk.usage_metadata:
            completion_tokens = chunk.usage_metadata.get('output_tokens')
            prompt_tokens = chunk.usage_metadata.get('input_tokens')
else:
    for chunk in runnable.stream({"input_text": input_text}):
        print(chunk.content, end="", flush=True)
        answer.append(chunk.content)

Originally, the code only had one for loop which retrieves the content from a stream and appends it to an array which forms the response of the completion.

Why did I write it this way?

My reasoning behind duplicating the for while adding the distinct if block is to prevent the code from repeatedly checking the if block even if the user is not using the newly added --token_usage flag. So instead, I check for the existence of the flag firstly, and then decide which for loop to execute.

Realization

Even though my pull request has been accepted by the project owner, I realized late that this way adds complexity to the code's maintainability. For example, if there are changes required in the for loop for processing the stream, that means modifying the code twice since there are two identical for loops.

What I think I could do as an improvement for it is to make it into a function so that any changes required can be done in one function only, keeping the maintainability of the code. This just proves that even if I wrote the code with optimization in mind, there are still other things that I can miss which is crucial to a project, which in this case, is maintainability.

Receiving a pull request

My tool, genereadme, also received a contribution. I received a PR from Mounayer, which is to add the same feature to my project.

feat: added a new flag that displays the number of tokens sent in prompt and received in completion #13

Mounayer posted on Sep 15, 2024

Description

Closes #12.

Added a new flag --token-usage which when given, prints the number of tokens that were sent in the prompt and the number of tokens that were returned in the completion to `stderr.

This simply required the addition for another flag check --token-usage:

   .option("--token-usage", "Show prompt and completion token usage")

I've also made sure to keep your naming conventions/formatting style consistent, in the for loop that does the chat completion for each file processed, I have accumulated the total tokens sent and received:

    promptTokens += response.usage.prompt_tokens;
    completionTokens += response.usage.completion_tokens;

which I then display at the end of program run-time if the --token-usage flag is provided as such:

    if (program.opts().tokenUsage) {
      console.error(`Prompt tokens: <span class="pl-s1"><span class="pl-kos">${promptTokens}</span>`</span>);
      console.error(`Completion tokens: <span class="pl-s1"><span class="pl-kos">${completionTokens}</span>`</span>);
    }

Updated README.md to explain the new flag.

Testing

Test 1

genereadme examples/sum.js --token-usage

This should display something like:

My first open source contribution

Test 2

You can try it out with multiple files too, i.e.:

genereadme examples/sum.js examples/createUser.js --token-usage

View on GitHub

This time, instead of having to read someone else's code, someone had to read mine and contribute to it. It is nice knowing that someone is able to contribute to my project. To me, it means that they understood how my code works, so they were able to add the feature without breaking anything or adding any complexity to the code base.
With that being mentioned, reading code is also a skill that is not to be underestimated. My code is nowhere near perfect and I know there are still places I can improve on, so credit is also due to being able to read and understand code.

This specific pull request did not really require any back and forth changes as the code that was written by Mounayer is what I would have written myself.

위 내용은 나의 첫 오픈소스 기여의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

성명

본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.