search
HomeBackend DevelopmentPython Tutorial在Python上基于Markov链生成伪随机文本的教程

 首先看一下来自Wolfram的定义

    马尔可夫链是随机变量{X_t}的集合(t贯穿0,1,...),给定当前的状态,未来与过去条件独立。

Wikipedia的定义更清楚一点儿

    ...马尔可夫链是具有马尔可夫性质的随机过程...[这意味着]状态改变是概率性的,未来的状态仅仅依赖当前的状态。

马尔可夫链具有多种用途,现在让我看一下如何用它生产看起来像模像样的胡言乱语。

算法如下,

  1.     找一个作为语料库的文本,语料库用于选择接下来的转换。
  2.     从文本中两个连续的单词开始,最后的两个单词构成当前状态。
  3.     生成下一个单词的过程就是马尔可夫转换。为了生成下一个单词,首先查看语料库,查找这两个单词之后跟着的单词。从它们中随机选择一个。
  4.     重复2,直到生成的文本达到需要的大小。


代码如下
 

import random
 
class Markov(object):
  
 def __init__(self, open_file):
  self.cache = {}
  self.open_file = open_file
  self.words = self.file_to_words()
  self.word_size = len(self.words)
  self.database()
   
  
 def file_to_words(self):
  self.open_file.seek(0)
  data = self.open_file.read()
  words = data.split()
  return words
   
  
 def triples(self):
  """ Generates triples from the given data string. So if our string were
    "What a lovely day", we'd generate (What, a, lovely) and then
    (a, lovely, day).
  """
   
  if len(self.words) < 3:
   return
   
  for i in range(len(self.words) - 2):
   yield (self.words[i], self.words[i+1], self.words[i+2])
    
 def database(self):
  for w1, w2, w3 in self.triples():
   key = (w1, w2)
   if key in self.cache:
    self.cache[key].append(w3)
   else:
    self.cache[key] = [w3]
     
 def generate_markov_text(self, size=25):
  seed = random.randint(0, self.word_size-3)
  seed_word, next_word = self.words[seed], self.words[seed+1]
  w1, w2 = seed_word, next_word
  gen_words = []
  for i in xrange(size):
   gen_words.append(w1)
   w1, w2 = w2, random.choice(self.cache[(w1, w2)])
  gen_words.append(w2)
  return ' '.join(gen_words)

为了看到一个示例结果,我们从古腾堡计划中拿了沃德豪斯的《My man jeeves》作为文本,示例结果如下。
 

In [1]: file_ = open('/home/shabda/jeeves.txt')
 
In [2]: import markovgen
 
In [3]: markov = markovgen.Markov(file_)
 
In [4]: markov.generate_markov_text()
Out[4]: 'Can you put a few years of your twin-brother Alfred,
who was apt to rally round a bit. I should strongly advocate
the blue with milk'

[如果想执行这个例子,请下载jeeves.txt和markovgen.py
马尔可夫算法怎样呢?

  •     最后两个单词是当前状态。
  •     接下来的单词仅仅依赖最后两个单词,也就是当前状态。
  •     接下来的单词是从语料库的统计模型中随机选择的。

这是一个示例文本。

复制代码 代码如下:
"The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead."

这个文本对应的语料库像这样,
 

{('The', 'quick'): ['brown'],
 ('brown', 'fox'): ['jumps', 'who', 'who'],
 ('fox', 'jumps'): ['over'],
 ('fox', 'who'): ['is', 'is'],
 ('is', 'slow'): ['jumps'],
 ('jumps', 'over'): ['the', 'the'],
 ('over', 'the'): ['brown', 'brown'],
 ('quick', 'brown'): ['fox'],
 ('slow', 'jumps'): ['over'],
 ('the', 'brown'): ['fox', 'fox'],
 ('who', 'is'): ['slow', 'dead.']}

现在如果我们从"brown fox"开始,接下来的单词可以是"jumps"或者"who"。如果我们选择"jumps",然后当前的状态就变成了"fox jumps",再接下的单词就是"over",之后依此类推。

提示

  •     我们选择的文本越大,每次转换的选择更多,生成的文本更好看。
  •     状态可以设置为依赖一个单词、两个单词或者任意数量的单词。随着每个状态的单词数的增加,生成的文本更不随机。
  •     不要去掉标点符号等。它们会使语料库更具代表性,随机文本更好看。

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Python: compiler or Interpreter?Python: compiler or Interpreter?May 13, 2025 am 12:10 AM

Python is an interpreted language, but it also includes the compilation process. 1) Python code is first compiled into bytecode. 2) Bytecode is interpreted and executed by Python virtual machine. 3) This hybrid mechanism makes Python both flexible and efficient, but not as fast as a fully compiled language.

Python For Loop vs While Loop: When to Use Which?Python For Loop vs While Loop: When to Use Which?May 13, 2025 am 12:07 AM

Useaforloopwheniteratingoverasequenceorforaspecificnumberoftimes;useawhileloopwhencontinuinguntilaconditionismet.Forloopsareidealforknownsequences,whilewhileloopssuitsituationswithundeterminediterations.

Python loops: The most common errorsPython loops: The most common errorsMay 13, 2025 am 12:07 AM

Pythonloopscanleadtoerrorslikeinfiniteloops,modifyinglistsduringiteration,off-by-oneerrors,zero-indexingissues,andnestedloopinefficiencies.Toavoidthese:1)Use'i

For loop and while loop in Python: What are the advantages of each?For loop and while loop in Python: What are the advantages of each?May 13, 2025 am 12:01 AM

Forloopsareadvantageousforknowniterationsandsequences,offeringsimplicityandreadability;whileloopsareidealfordynamicconditionsandunknowniterations,providingcontrolovertermination.1)Forloopsareperfectforiteratingoverlists,tuples,orstrings,directlyacces

Python: A Deep Dive into Compilation and InterpretationPython: A Deep Dive into Compilation and InterpretationMay 12, 2025 am 12:14 AM

Pythonusesahybridmodelofcompilationandinterpretation:1)ThePythoninterpretercompilessourcecodeintoplatform-independentbytecode.2)ThePythonVirtualMachine(PVM)thenexecutesthisbytecode,balancingeaseofusewithperformance.

Is Python an interpreted or a compiled language, and why does it matter?Is Python an interpreted or a compiled language, and why does it matter?May 12, 2025 am 12:09 AM

Pythonisbothinterpretedandcompiled.1)It'scompiledtobytecodeforportabilityacrossplatforms.2)Thebytecodeistheninterpreted,allowingfordynamictypingandrapiddevelopment,thoughitmaybeslowerthanfullycompiledlanguages.

For Loop vs While Loop in Python: Key Differences ExplainedFor Loop vs While Loop in Python: Key Differences ExplainedMay 12, 2025 am 12:08 AM

Forloopsareidealwhenyouknowthenumberofiterationsinadvance,whilewhileloopsarebetterforsituationswhereyouneedtoloopuntilaconditionismet.Forloopsaremoreefficientandreadable,suitableforiteratingoversequences,whereaswhileloopsoffermorecontrolandareusefulf

For and While loops: a practical guideFor and While loops: a practical guideMay 12, 2025 am 12:07 AM

Forloopsareusedwhenthenumberofiterationsisknowninadvance,whilewhileloopsareusedwhentheiterationsdependonacondition.1)Forloopsareidealforiteratingoversequenceslikelistsorarrays.2)Whileloopsaresuitableforscenarioswheretheloopcontinuesuntilaspecificcond

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment