Repository: https://github.com/ray-d-song/guesslang-js
Demo: https://ray-d-song.github.io/guesslang-js/
Recently, I'm working on a project called EchoRSS, and I have a very wanted feature, which is to intercept external links in subscriptions (read full text, quote, etc.) and display them directly on the current page.
There is a problem that the returned HTML code block loses the language annotation (or the language was not annotated on the pre and code tags in the original code block), so it cannot be highlighted using tools like shiki or prism.js.
I found three solutions to detect code language:
1. linguist
This is a Ruby project deployed on the server, and Github uses it to detect the language composition of the repository. If you need extremely high accuracy and can be calculated on the server, this is the best solution.
2. hljs
highlight.js is a very famous web code highlighting library, and it is also the only library that provides automatic code detection.
The principle is very simple, which is to enumerate the keywords of the language, and then match them one by one with the text, and finally see which one has the highest matching degree.
hljs has four problems.
- It requires a very long code length, and most languages require at least 300 characters to achieve a relatively good accuracy.
- The part that detects the language is not a separate module, but tightly coupled with the parser and render, and the code is also very imperative, making it difficult to extract useful parts.
- If you don't extract the detection module, the original format (line breaks and indentation) of the code will be lost when using hljs to highlight.
- It requires a lot of regular matching, the performance is poor, and because of reason 2, it cannot be run in a web worker.
3. guesslang
guesslang is a machine learning project based on tensorflow.js.
Microsoft ported this project to node.js in 2021 and added the automatic language detection function to vscode.
A Vietnamese guy hieplpvip three years ago also ported this project to the browser, but there are also three problems:
- Memory leak, memory leak...
- Only supports the <script> tag to introduce the umd format, does not support esm, does not support bundle</script>
- Similarly, because of reason 2, it does not support web worker
The guy has not maintained this project, and the feat request to support esm in March has not been replied.
So I extracted the detection module from hljs, and forked guesslang-js to fix the above problems, and in the end guesslang won, the result is this:
https://github.com/ray-d-song/guesslang-js
I think it's too much to talk about, maybe someone will need it in the future, so I'll post it.
If someone knows tensorflow.js, I hope they can recommend some learning materials, I want to further modify it to web gpu calculation to improve efficiency.
The above is the detailed content of How to detect code language in browser. For more information, please follow other related articles on the PHP Chinese website!

Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

JavaScript's applications in the real world include server-side programming, mobile application development and Internet of Things control: 1. Server-side programming is realized through Node.js, suitable for high concurrent request processing. 2. Mobile application development is carried out through ReactNative and supports cross-platform deployment. 3. Used for IoT device control through Johnny-Five library, suitable for hardware interaction.

I built a functional multi-tenant SaaS application (an EdTech app) with your everyday tech tool and you can do the same. First, what’s a multi-tenant SaaS application? Multi-tenant SaaS applications let you serve multiple customers from a sing

This article demonstrates frontend integration with a backend secured by Permit, building a functional EdTech SaaS application using Next.js. The frontend fetches user permissions to control UI visibility and ensures API requests adhere to role-base

JavaScript is the core language of modern web development and is widely used for its diversity and flexibility. 1) Front-end development: build dynamic web pages and single-page applications through DOM operations and modern frameworks (such as React, Vue.js, Angular). 2) Server-side development: Node.js uses a non-blocking I/O model to handle high concurrency and real-time applications. 3) Mobile and desktop application development: cross-platform development is realized through ReactNative and Electron to improve development efficiency.

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

Python is more suitable for data science and machine learning, while JavaScript is more suitable for front-end and full-stack development. 1. Python is known for its concise syntax and rich library ecosystem, and is suitable for data analysis and web development. 2. JavaScript is the core of front-end development. Node.js supports server-side programming and is suitable for full-stack development.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

WebStorm Mac version
Useful JavaScript development tools