Analysieren Sie den Befehl „pip install“, um Bereiche im Text des installierten Pakets abzurufen

Question

Ich arbeite an einem Projekt, bei dem ich die Namen und Speicherorte der mit dem Befehl pipinstall installierten Python-Pakete extrahieren muss. Eine Webseite enthält ein Codeelement, das mehrere Textzeilen und Bash-Befehle enthält. Ich möchte einen JS-Code schreiben, der diesen Text analysieren und die Pakete und ihre Position im Text finden kann. Wenn der Text beispielsweise lautet: $pipinstallnumpypipinstall--global-optionbuild_ext-t../pandas>=1.0.0,<2sud

P粉773659687 · Answer

您可以在此答案中查看我解释的代码。

这里还有另一种类似的解决方案，更基于正则表达式：

const pipOptionsWithArg = [
  '-c',
  '--constraint',
  '-e',
  '--editable',
  '-t',
  '--target',
  '--platform',
  '--python-version',
  '--implementation',
  '--abi',
  '--root',
  '--prefix',
  '-b',
  '--build',
  '--src',
  '--upgrade-strategy',
  '--install-option',
  '--global-option',
  '--no-binary',
  '--only-binary',
  '--progress-bar',
  '-i',
  '--index-url',
  '--extra-index-url',
  '-f',
  '--find-links',
  '--log',
  '--proxy',
  '--retires',
  '--timeout',
  '--exists-action',
  '--trusted-host',
  '--cert',
  '--client-cert',
  '--cache-dir',
];
const optionWithArgRegex = `( (${pipOptionsWithArg.join('|')})(=| )\S+)*`;
const options = /( -[-\w=]+)*/;
const packageArea = /["']?(?(?\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/g;
const repeatedPackages = `(?( ${packageArea.source})+)`;
const whiteSpace = / +/;
const PIP_COMMAND_REGEX = new RegExp(
  `(?pip install${optionWithArgRegex}${options.source})${repeatedPackages}`.replaceAll(' ', whiteSpace.source),
  'g'
);
export const parseCommand = (command) => {
  const matches = Array.from(command.matchAll(PIP_COMMAND_REGEX));

  const results = matches.flatMap((match) => {
    const packagesStr = match?.groups.packages;
    if (!packagesStr) return [];

    const packagesIndex = command.indexOf(packagesStr, match.index + match.groups.command.length);

    return Array.from(packagesStr.matchAll(packageArea))
      .map((packageMatch) => {
        const packagePart = packageMatch.groups.package_part;
        const name = packageMatch.groups.package_name;

        const startIndex = packagesIndex + packagesStr.indexOf(packagePart, packageMatch.index);
        const endIndex = startIndex + packagePart.length;

        return {
          type: 'pypi',
          name,
          version: undefined,
          startIndex,
          endIndex,
        };
      })
      .filter((result) => result.name !== 'requirements.txt');
  });

  return results;
};

P粉194541072 · Answer

这里是一个可选的解决方案，尝试使用循环而不是正则表达式：

思路是找到包含 pip install 文本的行，这些行是我们感兴趣的行。然后，将命令分解成单词，并在它们上进行循环，直到达到命令的包部分。

首先，我们将定义一个用于包的正则表达式。请记住，一个包可以是像 pip install 'stevedore>=1.3.0,<1.4.0' "MySQL_python==1.2.2" 这样的东西：

const packageArea = /(?<=\s|^)["']?(?(?\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/;

注意到命名分组，package_part 用于识别“带版本的包”字符串，而 package_name 用于提取包名。

关于参数

我们有两种类型的命令行参数：选项和标志。

选项的问题在于我们需要理解下一个单词不是包名，而是选项值。

所以，我首先列出了 pip install 命令中的所有选项：

const pipOptionsWithArg = [
  '-c',
  '--constraint',
  '-e',
  '--editable',
  '-t',
  '--target',
  '--platform',
  '--python-version',
  '--implementation',
  '--abi',
  '--root',
  '--prefix',
  '-b',
  '--build',
  '--src',
  '--upgrade-strategy',
  '--install-option',
  '--global-option',
  '--no-binary',
  '--only-binary',
  '--progress-bar',
  '-i',
  '--index-url',
  '--extra-index-url',
  '-f',
  '--find-links',
  '--log',
  '--proxy',
  '--retires',
  '--timeout',
  '--exists-action',
  '--trusted-host',
  '--cert',
  '--client-cert',
  '--cache-dir',
];

然后我编写了一个稍后将使用的函数，用于在看到一个参数时决定要做什么：

const handleArgument = (argument, restCommandWords) => {
  let index = 0;
  index += argument.length + 1; // +1 是为了去掉 split 时的空格

  if (argument === '-r' || argument === '--requirement') {
    while (restCommandWords.length > 0) {
      index += restCommandWords.shift().length + 1;
    }
    return index;
  }

  if (!pipOptionsWithArg.includes(argument)) {
    return index;
  }

  if (argument.includes('=')) return index;

  index += restCommandWords.shift().length + 1;
  return index;
};

这个函数接收到了识别出的参数和命令的其余部分，分割成单词。

(在这里你开始看到“索引计数器”。由于我们还需要找到每个发现的位置，我们需要跟踪原始文本中的当前位置)。

在函数的最后几行中，你可以看到我处理了 --option=something 和 --option something 两种情况。

解析器

现在主解析器将原始文本分割成行，然后再分割成单词。

每个操作都必须更新 全局索引，以跟踪我们在文本中的位置，并且这个索引帮助我们在文本中搜索和查找，而不会陷入错误的子字符串中，使用 indexOf(str, counterIndex)：

export const parseCommand = (multilineCommand) => {
  const packages = [];
  let counterIndex = 0;

  const lines = multilineCommand.split('
');
  while (lines.length > 0) {
    const line = lines.shift();

    const pipInstallMatch = line.match(/pip +install/);
    if (!pipInstallMatch) {
      counterIndex += line.length + 1; // +1 是为了换行符
      continue;
    }

    const pipInstallLength = pipInstallMatch.index + pipInstallMatch[0].length;
    const argsAndPackagesWords = line.slice(pipInstallLength).split(' ');
    counterIndex += pipInstallLength;

    while (argsAndPackagesWords.length > 0) {
      const word = argsAndPackagesWords.shift();

      if (!word) {
        counterIndex++;
        continue;
      }

      if (word.startsWith('-')) {
        counterIndex += handleArgument(word, argsAndPackagesWords);
        continue;
      }

      const packageMatch = word.match(packageArea);
      if (!packageMatch) {
        counterIndex += word.length + 1;
        continue;
      }

      const startIndex = multilineCommand.indexOf(packageMatch.groups.package_part, counterIndex);
      packages.push({
        type: 'pypi',
        name: packageMatch.groups.package_name,
        version: undefined,
        startIndex,
        endIndex: startIndex + packageMatch.groups.package_part.length,
      });

      counterIndex += word.length + 1;
    }
  }

  return packages;
};

Analysieren Sie den Befehl „pip install“, um Bereiche im Text des installierten Pakets abzurufen

Antworte allen(2)Ich werde antworten

关于参数

解析器