search

Home  >  Q&A  >  body text

Parse 'pip install' command to get ranges in installed package text

I am working on a project that requires me to extract the name and location of a Python package installed using the pip install command.

A web page contains a code element that contains multiple lines of text and bash commands. I want to write a JS code that can parse this text and find the packages and their location in the text.

For example, if the text is:

$ pip install numpy
pip install --global-option build_ext -t ../ pandas>=1.0.0,<2
sudo apt update
pip uninstall numpy
pip install "requests==12.2.2"

I want to get results similar to this:

[
    {
        "name": "numpy",
        "position": 14
    },
    {
        "name": "pandas",
        "position": 65
    },
    {
        "name": "requests",
        "position": 131
    }
]

How do I implement this functionality in JavaScript?

P粉431220279P粉431220279442 days ago567

reply all(2)I'll reply

  • P粉773659687

    P粉7736596872023-09-08 15:53:55

    You can see the code I explained in this answer.

    Here is another similar solution, more based on regular expressions:

    const pipOptionsWithArg = [
      '-c',
      '--constraint',
      '-e',
      '--editable',
      '-t',
      '--target',
      '--platform',
      '--python-version',
      '--implementation',
      '--abi',
      '--root',
      '--prefix',
      '-b',
      '--build',
      '--src',
      '--upgrade-strategy',
      '--install-option',
      '--global-option',
      '--no-binary',
      '--only-binary',
      '--progress-bar',
      '-i',
      '--index-url',
      '--extra-index-url',
      '-f',
      '--find-links',
      '--log',
      '--proxy',
      '--retires',
      '--timeout',
      '--exists-action',
      '--trusted-host',
      '--cert',
      '--client-cert',
      '--cache-dir',
    ];
    const optionWithArgRegex = `( (${pipOptionsWithArg.join('|')})(=| )\S+)*`;
    const options = /( -[-\w=]+)*/;
    const packageArea = /["']?(?<package_part>(?<package_name>\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/g;
    const repeatedPackages = `(?<packages>( ${packageArea.source})+)`;
    const whiteSpace = / +/;
    const PIP_COMMAND_REGEX = new RegExp(
      `(?<command>pip install${optionWithArgRegex}${options.source})${repeatedPackages}`.replaceAll(' ', whiteSpace.source),
      'g'
    );
    export const parseCommand = (command) => {
      const matches = Array.from(command.matchAll(PIP_COMMAND_REGEX));
    
      const results = matches.flatMap((match) => {
        const packagesStr = match?.groups.packages;
        if (!packagesStr) return [];
    
        const packagesIndex = command.indexOf(packagesStr, match.index + match.groups.command.length);
    
        return Array.from(packagesStr.matchAll(packageArea))
          .map((packageMatch) => {
            const packagePart = packageMatch.groups.package_part;
            const name = packageMatch.groups.package_name;
    
            const startIndex = packagesIndex + packagesStr.indexOf(packagePart, packageMatch.index);
            const endIndex = startIndex + packagePart.length;
    
            return {
              type: 'pypi',
              name,
              version: undefined,
              startIndex,
              endIndex,
            };
          })
          .filter((result) => result.name !== 'requirements.txt');
      });
    
      return results;
    };
    

    reply
    0
  • P粉194541072

    P粉1945410722023-09-08 10:50:01

    Here is an optional solution, try using a loop instead of a regular expression:

    The idea is to find the lines containing the text pip install. These lines are the lines we are interested in. Then, break the command into words and loop over them until you reach the package part of the command.

    First, we will define a regular expression for the package. Remember, a package can be something like pip install 'stevedore>=1.3.0,<1.4.0' "MySQL_python==1.2.2":

    const packageArea = /(?<=\s|^)["']?(?<package_part>(?<package_name>\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/;
    

    NOTENamed grouping, package_part is used to identify the "package with version" string, and package_name is used to extract Package names.


    About parameters

    We have two types of command line arguments: options and flags.

    The problem with

    options is that we need to understand that the next word is not the package name, but the options value.

    So, I first listed all the options in the pip install command:

    const pipOptionsWithArg = [
      '-c',
      '--constraint',
      '-e',
      '--editable',
      '-t',
      '--target',
      '--platform',
      '--python-version',
      '--implementation',
      '--abi',
      '--root',
      '--prefix',
      '-b',
      '--build',
      '--src',
      '--upgrade-strategy',
      '--install-option',
      '--global-option',
      '--no-binary',
      '--only-binary',
      '--progress-bar',
      '-i',
      '--index-url',
      '--extra-index-url',
      '-f',
      '--find-links',
      '--log',
      '--proxy',
      '--retires',
      '--timeout',
      '--exists-action',
      '--trusted-host',
      '--cert',
      '--client-cert',
      '--cache-dir',
    ];
    

    I then wrote a function that I will use later to decide what to do when it sees an argument:

    const handleArgument = (argument, restCommandWords) => {
      let index = 0;
      index += argument.length + 1; // +1 是为了去掉 split 时的空格
    
      if (argument === '-r' || argument === '--requirement') {
        while (restCommandWords.length > 0) {
          index += restCommandWords.shift().length + 1;
        }
        return index;
      }
    
      if (!pipOptionsWithArg.includes(argument)) {
        return index;
      }
    
      if (argument.includes('=')) return index;
    
      index += restCommandWords.shift().length + 1;
      return index;
    };
    

    This function receives the recognized parameters and the rest of the command, split into words.

    (Here you start to see the "index counter". Since we also need to find the position of each discovery, we need to keep track of the current position in the original text).

    In the last few lines of the function, you can see that I handle both cases --option=something and --option something.


    Parser

    Now the main parser splits the raw text into lines and then into words.

    Every operation must update the global index to keep track of where we are in the text, and this index helps us search and find within the text without getting stuck in the wrong substring, Use indexOf(str, counterIndex):

    export const parseCommand = (multilineCommand) => {
      const packages = [];
      let counterIndex = 0;
    
      const lines = multilineCommand.split('\n');
      while (lines.length > 0) {
        const line = lines.shift();
    
        const pipInstallMatch = line.match(/pip +install/);
        if (!pipInstallMatch) {
          counterIndex += line.length + 1; // +1 是为了换行符
          continue;
        }
    
        const pipInstallLength = pipInstallMatch.index + pipInstallMatch[0].length;
        const argsAndPackagesWords = line.slice(pipInstallLength).split(' ');
        counterIndex += pipInstallLength;
    
        while (argsAndPackagesWords.length > 0) {
          const word = argsAndPackagesWords.shift();
    
          if (!word) {
            counterIndex++;
            continue;
          }
    
          if (word.startsWith('-')) {
            counterIndex += handleArgument(word, argsAndPackagesWords);
            continue;
          }
    
          const packageMatch = word.match(packageArea);
          if (!packageMatch) {
            counterIndex += word.length + 1;
            continue;
          }
    
          const startIndex = multilineCommand.indexOf(packageMatch.groups.package_part, counterIndex);
          packages.push({
            type: 'pypi',
            name: packageMatch.groups.package_name,
            version: undefined,
            startIndex,
            endIndex: startIndex + packageMatch.groups.package_part.length,
          });
    
          counterIndex += word.length + 1;
        }
      }
    
      return packages;
    };
    

    reply
    0
  • Cancelreply