解析'pip install'命令以获取已安装软件包文本中的范围

Question

我正在进行一个项目，需要我提取使用pipinstall命令安装的Python包的名称和位置。一个网页包含一个code元素，其中包含多行文本和bash命令。我想编写一个JS代码，可以解析这个文本并找到包和它们在文本中的位置。例如，如果文本是：$pipinstallnumpypipinstall--global-optionbuild_ext-t../pandas>=1.0.0,<2sud

P粉773659687 · Answer

您可以在此答案中查看我解释的代码。

这里还有另一种类似的解决方案，更基于正则表达式：

const pipOptionsWithArg = [
  '-c',
  '--constraint',
  '-e',
  '--editable',
  '-t',
  '--target',
  '--platform',
  '--python-version',
  '--implementation',
  '--abi',
  '--root',
  '--prefix',
  '-b',
  '--build',
  '--src',
  '--upgrade-strategy',
  '--install-option',
  '--global-option',
  '--no-binary',
  '--only-binary',
  '--progress-bar',
  '-i',
  '--index-url',
  '--extra-index-url',
  '-f',
  '--find-links',
  '--log',
  '--proxy',
  '--retires',
  '--timeout',
  '--exists-action',
  '--trusted-host',
  '--cert',
  '--client-cert',
  '--cache-dir',
];
const optionWithArgRegex = `( (${pipOptionsWithArg.join('|')})(=| )\S+)*`;
const options = /( -[-\w=]+)*/;
const packageArea = /["']?(?(?\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/g;
const repeatedPackages = `(?( ${packageArea.source})+)`;
const whiteSpace = / +/;
const PIP_COMMAND_REGEX = new RegExp(
  `(?pip install${optionWithArgRegex}${options.source})${repeatedPackages}`.replaceAll(' ', whiteSpace.source),
  'g'
);
export const parseCommand = (command) => {
  const matches = Array.from(command.matchAll(PIP_COMMAND_REGEX));

  const results = matches.flatMap((match) => {
    const packagesStr = match?.groups.packages;
    if (!packagesStr) return [];

    const packagesIndex = command.indexOf(packagesStr, match.index + match.groups.command.length);

    return Array.from(packagesStr.matchAll(packageArea))
      .map((packageMatch) => {
        const packagePart = packageMatch.groups.package_part;
        const name = packageMatch.groups.package_name;

        const startIndex = packagesIndex + packagesStr.indexOf(packagePart, packageMatch.index);
        const endIndex = startIndex + packagePart.length;

        return {
          type: 'pypi',
          name,
          version: undefined,
          startIndex,
          endIndex,
        };
      })
      .filter((result) => result.name !== 'requirements.txt');
  });

  return results;
};

P粉194541072 · Answer

这里是一个可选的解决方案，尝试使用循环而不是正则表达式：

思路是找到包含 pip install 文本的行，这些行是我们感兴趣的行。然后，将命令分解成单词，并在它们上进行循环，直到达到命令的包部分。

首先，我们将定义一个用于包的正则表达式。请记住，一个包可以是像 pip install 'stevedore>=1.3.0,<1.4.0' "MySQL_python==1.2.2" 这样的东西：

const packageArea = /(?<=\s|^)["']?(?(?\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/;

注意到命名分组，package_part 用于识别“带版本的包”字符串，而 package_name 用于提取包名。

关于参数

我们有两种类型的命令行参数：选项和标志。

选项的问题在于我们需要理解下一个单词不是包名，而是选项值。

所以，我首先列出了 pip install 命令中的所有选项：

const pipOptionsWithArg = [
  '-c',
  '--constraint',
  '-e',
  '--editable',
  '-t',
  '--target',
  '--platform',
  '--python-version',
  '--implementation',
  '--abi',
  '--root',
  '--prefix',
  '-b',
  '--build',
  '--src',
  '--upgrade-strategy',
  '--install-option',
  '--global-option',
  '--no-binary',
  '--only-binary',
  '--progress-bar',
  '-i',
  '--index-url',
  '--extra-index-url',
  '-f',
  '--find-links',
  '--log',
  '--proxy',
  '--retires',
  '--timeout',
  '--exists-action',
  '--trusted-host',
  '--cert',
  '--client-cert',
  '--cache-dir',
];

然后我编写了一个稍后将使用的函数，用于在看到一个参数时决定要做什么：

const handleArgument = (argument, restCommandWords) => {
  let index = 0;
  index += argument.length + 1; // +1 是为了去掉 split 时的空格

  if (argument === '-r' || argument === '--requirement') {
    while (restCommandWords.length > 0) {
      index += restCommandWords.shift().length + 1;
    }
    return index;
  }

  if (!pipOptionsWithArg.includes(argument)) {
    return index;
  }

  if (argument.includes('=')) return index;

  index += restCommandWords.shift().length + 1;
  return index;
};

这个函数接收到了识别出的参数和命令的其余部分，分割成单词。

(在这里你开始看到“索引计数器”。由于我们还需要找到每个发现的位置，我们需要跟踪原始文本中的当前位置)。

在函数的最后几行中，你可以看到我处理了 --option=something 和 --option something 两种情况。

解析器

现在主解析器将原始文本分割成行，然后再分割成单词。

每个操作都必须更新 全局索引，以跟踪我们在文本中的位置，并且这个索引帮助我们在文本中搜索和查找，而不会陷入错误的子字符串中，使用 indexOf(str, counterIndex)：

export const parseCommand = (multilineCommand) => {
  const packages = [];
  let counterIndex = 0;

  const lines = multilineCommand.split('
');
  while (lines.length > 0) {
    const line = lines.shift();

    const pipInstallMatch = line.match(/pip +install/);
    if (!pipInstallMatch) {
      counterIndex += line.length + 1; // +1 是为了换行符
      continue;
    }

    const pipInstallLength = pipInstallMatch.index + pipInstallMatch[0].length;
    const argsAndPackagesWords = line.slice(pipInstallLength).split(' ');
    counterIndex += pipInstallLength;

    while (argsAndPackagesWords.length > 0) {
      const word = argsAndPackagesWords.shift();

      if (!word) {
        counterIndex++;
        continue;
      }

      if (word.startsWith('-')) {
        counterIndex += handleArgument(word, argsAndPackagesWords);
        continue;
      }

      const packageMatch = word.match(packageArea);
      if (!packageMatch) {
        counterIndex += word.length + 1;
        continue;
      }

      const startIndex = multilineCommand.indexOf(packageMatch.groups.package_part, counterIndex);
      packages.push({
        type: 'pypi',
        name: packageMatch.groups.package_name,
        version: undefined,
        startIndex,
        endIndex: startIndex + packageMatch.groups.package_part.length,
      });

      counterIndex += word.length + 1;
    }
  }

  return packages;
};

解析'pip install'命令以获取已安装软件包文本中的范围

全部回复(2)我来回复

关于参数

解析器