Home > Article > Technology peripherals > Apply Threat Detection Technology: Key to Network Security, Risks Also Considered
Incident response classification and software vulnerability discovery are two areas where large language models are successful, although false positives are common.
ChatGPT is a groundbreaking chatbot powered by the neural network-based language model text-davinci-003 and trained on large text datasets from the Internet. It is capable of generating human-like text in various styles and formats. ChatGPT can be fine-tuned for specific tasks, such as answering questions, summarizing text, and even solving cybersecurity-related problems, such as generating incident reports or interpreting decompiled code. Security researchers and AI hackers have taken an interest in ChatGPT in an attempt to explore LLM's weaknesses, while other researchers as well as cybercriminals have attempted to lure LLM to the dark side, setting it up as a force-generating tool for generating better phishing emails or generating malware. There have been some cases where bad actors have tried to exploit ChatGPT to generate malicious objects, for example, phishing emails or even polymorphic malware.
Numerous experiments by security analysts are showing that the popular large language model (LLM) ChatGPT may be useful in helping cybersecurity defenders classify potential security incidents and discover security vulnerabilities in code, even if artificial intelligence (AI) models are not specifically trained for this type of activity.
In an analysis of ChatGPT's utility as an incident response tool, security analysts found that ChatGPT can identify malicious processes running on compromised systems. Infecting a system by using Meterpreter and PowerShell Empire agents, taking common steps in the adversary's role, then running a ChatGPT-powered malware scanner against the system. LLM identified two malicious processes running on the system and correctly ignored 137 benign processes, leveraging ChatGPT to reduce the overhead to a large extent.
Security researchers are also studying how universal language models perform on specific defense-related tasks. In December, digital forensics firm Cado Security used ChatGPT to analyze JSON data from real security incidents to create a timeline of hacks, resulting in a good but not entirely accurate report. Security consulting firm NCC Group tried to use ChatGPT as a way to find vulnerabilities in code. Although ChatGPT did it, the vulnerability identification was not always accurate.
From a practical use perspective, security analysts, developers, and reverse engineers need to be careful when using LLM, especially for tasks beyond their capabilities. "I definitely think professional developers and others working with code should explore ChatGPT and similar models, but more for inspiration than absolutely correct factual results," said Chris Anley, chief scientist at security consulting firm NCC Group. ,” he said, adding that “security code review is not something we should be using ChatGPT for, so it’s unfair to expect it to be perfect the first time.”
Using AI to analyze IoC
Security and threat research often publicly discloses its findings (adversary indicators, tactics, techniques, and procedures) in the form of reports, presentations, blog posts, tweets, and other types of content ).
Therefore, we initially decided to examine ChatGPT for threat research and whether it could help identify simple, well-known adversary tools such as Mimikatz and fast reverse proxies, And discover common renaming strategies. The output looks promising!
So for classic intrusion indicators, such as the well-known malicious hashes and domain names, can ChatGPT answer correctly? Unfortunately, in our quick experiments, ChatGPT failed to produce satisfactory results: it failed to identify Wannacry's well-known hash (hash: 5bef35496fcbdbe841c82f4d1ab8b7c2).
For the domain names used by multiple APT activities, ChatGPT generated a basically the same domain name list and provided a description of the APT attacker. We may know nothing about some domain names?
As for the domains used by FIN7, chatGPT correctly classifies them as malicious, although the reason it gives is that "the domain is likely an attempt to trick users into believing it is a legitimate domain," rather than There are well-known indicators of compromise.
While the last experiment on imitating domain names of well-known websites gave an interesting result, more research is needed: it is difficult to say why ChatGPT produces better results for host-based security incidents than comparing domain names and hashes, etc. Simple indicators give better results. Certain filters may have been applied to the training data set, or if the problem itself was framed differently (a well-defined problem is half a problem solved!)
In any case, due to the concerns over host-based security incidents The response looked more promising, and we instructed ChatGPT to write some code to extract various metadata from a test Windows system and then ask if the metadata was an indicator of a breach:
Some code snippets are more convenient to use than others, so we decided to continue developing this PoC manually: we filtered the event output of ChatGPT's answers for statements containing "yes" about the presence of intrusion indicators, added exception handlers and CSV report, fixed minor bugs, and converted the code snippet into a separate cmdlet, resulting in a simple IoC security scanner HuntWithChatGPT.psm1, capable of scanning remote systems via WinRM:
Get-ChatGPTIoCScanResults-apiKey OpenAI API key https://beta.openai.com/docs/api-reference/authentication -SkipWarning []-Path -IoCOnly []Export only Indicators of compromise-ComputerName Remote Computer's Name-Credential Remote Computer's credentials
We used Meterpreter and PowerShell Empire agents to infect the target system and simulated some typical attack programs. When the scanner is executed against the target system, it generates a scan report containing the conclusion of ChatGPT:
Correctly identified two maliciously running processes out of 137 benign processes , without any false positives.
Please note that ChatGPT provides reasons why it concludes that metadata is an indicator of a breach, such as "The command line is trying to download a file from an external server" or "It is Use the "-ep bypass" flag, which tells PowerShell to bypass the security checks that normally exist.
For the service installation event, we slightly modified the question to guide ChatGPT to "think step by step" so that it would slow it down and avoid cognitive bias, as suggested by multiple researchers on Twitter That:
Are the Windows Service Name "$ServiceName" below and the Launch String "$Servicecmd" below an indicator of compromise? Please think step by step.
#ChatGPT successfully identified suspicious service installations with no false positives. It creates a valid hypothesis that "code is used to disable logging or other security measures on Windows systems". For the second service, it provides its conclusion on why the service should be classified as an indicator of compromise: "These two pieces of information indicate that the Windows service and the string that starts the service may be associated with some form of malware or other malicious activity connections and should therefore be considered indicators of intrusion."
Process creation events in the Sysmon and Security logs were analyzed with the help of the corresponding PowerShell cmdlets Get-ChatGPTSysmonProcessCreationIoC and Get-ChatGPTProcessCreationIoC. The final report highlights that some incidents are malicious:
ChatGPT identified suspicious patterns in ActiveX code: "The command line includes commands to start a new process (svchost.exe) and terminate the current process (rundll32.exe)" .
correctly describes the lsass process dump attempt: "a.exe is running with elevated privileges and using lsass (representing the Local Security Authority Subsystem Service) as its target; finally, dbg.dmp indicates that in A memory dump is being created while running the debugger".
Sysmon driver uninstallation correctly detected: "The command line includes instructions for uninstalling the system monitoring driver."
When checking PowerShell script blocks, we modified the question to not only check for metrics, but also for obfuscation techniques:
Whether the following PowerShell script is obfuscated or contains Indicators of compromise? "$ScriptBlockText"
ChatGPT is not only able to detect obfuscation techniques, but also enumerates some XOR encryption, Base64 encoding and variable substitution.
Of course, this tool is not perfect and can produce both false positives and false negatives.
In the following example, ChatGPT did not detect malicious activity that dumped system credentials through the SAM registry, while in another example, the lsass.exe process was described as potentially indicating "malicious activity or security Risks, such as malware running on the system":
One interesting result of this experiment is the data reduction in the dataset. After simulating an adversary on a test system, the number of events for analysts to verify is significantly reduced:
Please note that testing is performed on a new, non-production system. A production system may generate more false positives.
Experiment Conclusion
In the above experiment, the security analyst conducted an experiment that started by asking ChatGPT for several hacking tools such as Mimikatz and Fast Reverse Proxy. The AI model successfully described these tools, but when asked to identify well-known hashes and domain names, ChatGPT failed, not describing it correctly. For example, LLM was unable to identify known hashes of the WannaCry malware. However, the relative success in identifying malicious code on the host led security analysts to attempt to ask ChatGPT to create a PowerShell script with the purpose of collecting metadata and compromise indicators from the system and submitting it to LLM.
Overall, security analysts used ChatGPT to analyze the metadata of more than 3,500 events on test systems and found 74 potential indicators of compromise, 17 of which were false positives. This experiment demonstrates that ChatGPT can be used to gather forensic information for companies that are not running endpoint detection and response (EDR) systems, detect code obfuscation, or reverse engineer code binaries.
While the exact implementation of IoC scanning may not be a very cost-effective solution currently at around $15-25 per host, it shows interesting neutral results and sheds light on future research and testing Opportunity. During our research we noticed the following areas where ChatGPT is a productivity tool for security analysts:
System inspection for indicators of compromise, especially if you still don’t have an EDR full of detection rules and need to perform some digital forensics and incidents In the case of responses (DFIR);
Compare the current signature-based rule set with the ChatGPT output to identify gaps — there are always some techniques or procedures that you as an analyst are unaware of or have forgotten to create signatures for .
Detect code obfuscation;
Similarity detection: Feed malware binaries to ChatGPT and try to ask it if any new binaries are similar to other binaries.
Asking the question correctly is half the problem solved, experimenting with questions and various statements in the model parameters may yield more valuable results, even for hashes and domain names. Additionally, beware of the false positives and false negatives this may produce. Because at the end of the day, this is just another statistical neural network prone to unexpected results.
Fair use and privacy rules need clarification
Similar experiments also raise some key questions about the data submitted to OpenAI’s ChatGPT system . Companies have begun to push back against using information from the internet to create datasets, with companies like Clearview AI and Stability AI facing lawsuits trying to curtail the use of their machine learning models.
Privacy is another issue. “Security professionals must determine whether submitting indicators of intrusion exposes sensitive data or whether submitting software code for analysis infringes on the company’s intellectual property rights,” NCC Group’s Anley said. “Whether submitting code to ChatGPT is a good idea is a big question. The extent depends on the circumstances," he added. "A lot of code is proprietary and protected by various laws, so I don't recommend that people submit code to third parties unless they have permission."
Other security experts have issued similar warnings: Using ChatGPT to detect intrusions sends sensitive data to the system, which may violate company policies and potentially pose business risks. By using these scripts you can send data (including sensitive data) to OpenAI, so be careful and check with the system owner beforehand.
This article is translated from: https://securelist.com/ioc-detection-experiments-with-chatgpt/108756/
The above is the detailed content of Apply Threat Detection Technology: Key to Network Security, Risks Also Considered. For more information, please follow other related articles on the PHP Chinese website!