沒工作也要「卷」。
有人P 了一張圖,表示Karpathy 為大家「烹調了一頓大餐」。
from minbpe import BasicTokenizertokenizer = BasicTokenizer()text = "aaabdaaabac"tokenizer.train(text, 256 + 3) # 256 are the byte tokens, then do 3 mergesprint(tokenizer.encode(text))# [258, 100, 258, 97, 99]print(tokenizer.decode([258, 100, 258, 97, 99]))# aaabdaaabactokenizer.save("toy")# writes two files: toy.model (for loading) and toy.vocab (for viewing)
text = "hello123!!!? (안녕하세요!) ?"# tiktokenimport tiktokenenc = tiktoken.get_encoding("cl100k_base")print(enc.encode(text))# [15339, 4513, 12340, 30, 320, 31495, 230, 75265, 243, 92245, 16715, 57037]# oursfrom minbpe import GPT4Tokenizertokenizer = GPT4Tokenizer()print(tokenizer.encode(text))# [15339, 4513, 12340, 30, 320, 31495, 230, 75265, 243, 92245, 16715, 57037]
以上是離開OpenAI待業的Karpathy做了個大模型新項目,Star量一日破千的詳細內容。更多資訊請關注PHP中文網其他相關文章!