This article is a translation of the "Best Open Source Software List" published by InfoWorld 2021 . InfoWorld is an information technology media company founded in 1978 and currently part of IDG. Every year, InfoWorld selects the "Best Open Source Software" (BOSSIE) of the year based on the software's contribution to the open source community and its influence in the industry. This award selection has been going on for more than ten years.The 29 open source projects that won this award include: Software development, development, cloud Native computing, machine learning and other types , let’s take a look below to see if there are any familiar faces! 1, Svelte and SvelteKit
## in many innovations Among the open source, front-end JavaScript frameworks, Svelte and its full-stack counterpart SvelteKit may be the most ambitious and visionary. Svelte disrupted the status quo by embracing compile-time strategies from the beginning and moved forward with great performance, continued development, and a great developer experience.Now in public beta, SvelteKit continues the Svelte tradition of taking the leap by embracing the latest tooling and making deployment to serverless environments a built-in feature. Address: https://github.com/sveltejs/svelte
2, Minikube
Minikube is an easy tool to run Kubernetes locally, making it easy to create a stand-alone version of Kubernetes inside a virtual machine on your laptop cluster. Easy to try Kubernetes or use Kubernetes for daily development.Address: https://github.com/kubernetes/minikube
3、Pixie
Pixie is an observability tool for Kubernetes applications. It can view high-level status of the cluster, such as service map, cluster resources and application traffic; it can also Drill down into more detailed views such as pod status, flame graphs, and individual full-body application requests. Pixie automatically collects telemetry data using eBPF, which collects, stores, and queries all telemetry data locally on the cluster, using less than 5% of the cluster CPU. Use cases for Pixie include in-cluster network monitoring, infrastructure health, service performance and database query profiling. Address: https://github.com/pixie-io/pixie
4. FastAPI
##FastAPI is a high-performance web framework for building APIs. Key Features:
Fast: Very high performance, comparable to NodeJS and Go
Fast Coding: Increase feature development speed by approximately 200% to 300%
Fewer errors: Reduce human error by approximately 40%
Intuitive: powerful editor support, auto-completion everywhere, less debugging time
Easy: Designed to be easy to use and learn, reducing time spent reading documentation.
Short: Reduce code duplication.
Robust: Get production-ready code with automatic interactive documentation
Standards-based: Based on and fully compatible with the API's open standards OpenAPI and JSON Schema
##Address: https://github.com/tiangolo/fastapi
5、Crystal
## Crystal has been in development for several years as a project to provide a programming language with the speed of C and the expressiveness of Ruby. With the release of Crystal 1.0 earlier this year, the language is now stable enough to be used for general workloads. Crystal uses static typing and the LLVM compiler to achieve high speed and avoid common problems like null references at runtime. Crystal can interface with existing C code to further increase speed and convenience, and it can also use compile-time macros to extend the syntax of the base language.
Address: https://github.com/crystal-lang/crystal
6. Windows Terminal
##Windows Terminal is a brand new, popular and powerful Command line terminal tool. It contains many features that have been highly requested by the community, such as: multi-Tab support, rich text, multi-language support, configurable, themes and styles, support for emoji and GPU-based text rendering, etc. At the same time the terminal still meets our goals and requirements to ensure that it remains fast, efficient and does not consume large amounts of memory and power. Follow the Linux Chinese communityAddress: https://github.com/Microsoft/Terminal
7. OBS Studio
##OBS Studio is a software for live streaming and screen recording Software designed for efficient capture, composition, encoding, recording and streaming of video content, supporting all streaming platforms.
- #High performance real-time video/audio capture and mixing. Create scenes composed of multiple sources, including window captures, images, text, browser windows, webcams, capture cards, and more.
- # Set up an unlimited number of scenes that users can switch seamlessly with custom transitions.
- Intuitive audio mixer with filters for each source, such as noise gate, noise suppression and gain. Full control over VST plug-in support.
- Powerful and easy-to-use configuration options. Add new sources, copy existing sources, and adjust their properties easily.
- A streamlined settings panel gives users access to a variety of configuration options to adjust every aspect of broadcasting or recording.
- The modular “Dock” UI allows users to rearrange the layout exactly as needed. Users can even pop each individual Dock into its own window.
Address: https://github.com/obsproject/obs-studio
8. Shotcut
##Shotcut is a cross-platform video editing tool. Allows one to make all standard corrections to audio and video tracks while applying effects and layering. Shotcut has a very active community and offers tons of how-to videos and tutorials to help novice and advanced videographers alike. It runs on Mac, Linux, BSD, and Windows - and despite being cross-platform, its interface is snappy and relatively simple to use compared to similar tools. Address: https://github.com/mltframework/shotcut
9、Weave GitOps Core
##Weave GitOps support Effective GitOps workflow for continuous delivery of applications into Kubernetes clusters. It is based on the leading GitOps engine CNCF Flux. Address: https://github.com/weaveworks/weave-gitops
10. Apache Solr
##Apache Solr is based on the full text of Lucene Search server is also the most popular enterprise search engine. Apache Lucene is the underlying search technology behind the search capabilities of most software you use—including other search engines like Elasticsearch. Unlike Elasticsearch, Solr has given up its open source license, although it remains free. Solr is clusterable, cloud-deployable, and powerful enough to build cloud-scale search services. It even includes an LTR algorithm to help automatically adjust and weight results. Address: https://github.com/apache/solr
11, MLflow
##MLflow was created by Databricks and powered by Linux Foundation hosting is an MLOps platform that allows people to track, manage and maintain various machine learning models, experiments and their deployment. It provides you with tools to record and query experiments (code, data, configurations, results), package data science code into projects, and link these projects into workflows. Address: https://github.com/mlflow/mlflow
12. Orange
##Orange aims to make data mining " Productive and fun". Orange allows users to create a data analysis workflow that performs various machine learning and analysis functions as well as visualizations. Compared to programmatic or textual tools like R Studio and Jupyter, Orange is very intuitive. You can drag widgets onto the canvas to load files, analyze the data with models and visualize the results. Address: https://github.com/biolab/orange3
13. Flutter
Flutter is built by Google’s engineering team. For creating high-performance, cross-platform mobile applications. Flutter is optimized for current and future mobile devices, focusing on low-latency input and high frame rates for Android and iOS. It can provide developers with a simple and efficient way to build and deploy cross-platform, high-performance mobile applications; provides users with a beautiful, fast, jitter-free app experience. Address: https://github.com/flutter
14. Apache Superset
##Apache Superset is an Airbnb (well-known online House short-term rental company) open source data exploration and visualization platform (formerly known as Panoramix, Caravel), this tool is very distinctive in visualization, ease of use and interactivity, users can easily perform visual analysis of data. Apache Superset is also an enterprise-grade business intelligence web application. Address: https://github.com/apache/superset
15. Presto
Presto is an open source distributed SQL Engine, for online analytical processing, runs in a cluster. Presto can query a wide variety of data sources, from files to databases, and return results to many business intelligence and analytics environments. What's more, Presto allows querying data wherever it lives, including Hive, Cassandra, relational databases, and proprietary data stores. A Presto query can combine data from multiple sources. Facebook uses Presto to perform interactive queries against several internal data stores, including their 300PB data warehouse. Address: https://github.com/prestodb/presto
16. Apache Arrow
##Apache Arrow is plane and hierarchical data define a language-independent columnar memory format organized for efficient analysis operations on modern CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead. Arrow libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.
Address: https://github.com/apache/arrow
17. InterpretML
InterpretML is an open source Explainable AI (XAI) package that contains Several state-of-the-art machine learning interpretability techniques. InterpretML lets you train interpretable glassbox models and interpret black-box systems. InterpretML can help you understand the global behavior of your model or understand the reasoning behind individual predictions. Among its many features, InterpretML has a "glass box" model from Microsoft Research called the Explainable Boosting Machine, which supports Lime for post-hoc interpretation with an approximation of the black-box model. Address: https://github.com/interpretml/interpret
18、Lime
Lime(local interpretable model-agnostic explanations (short for ) is a post-hoc technique that interprets the predictions of any machine learning classifier by perturbing the features of the input and examining the predictions. Lime is able to interpret any black-box classifier with two or more classes, which is suitable for both text and image domains. Lime is also included in InterpretML. Address: https://github.com/marcotcr/lime
19. Dask
##Dask is an open source library for parallel computing that can use Python Packages scale to multiple machines. Dask can distribute data and computation across multiple GPUs, either within the same system or in a multi-node cluster. Dask integrates with Rapids cuDF, XGBoost, and Rapids cuML for GPU-accelerated data analysis and machine learning. It also integrates with NumPy, Pandas and Scikit-learn to parallelize its workflow
Address: https://github.com/dask/dask
20. BlazingSQL
##BlazingSQL is built based on the RAPIDS ecosystem GPU accelerated SQL engine. RAPIDS is based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering and manipulating data. It is the SQL interface to cuDF with various features to support large-scale data science workflows and enterprise datasets. Address: https://github.com/BlazingDB/blazingsql
21. Rapids
Nvidia’s Rapids open source software library and API The suite gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Rapids uses Nvidia CUDA primitives for low-level compute optimizations and exposes the parallelism and high-bandwidth memory speeds of GPUs through a user-friendly Python interface. Rapids relies on the Apache Arrow columnar memory format, including cuDF, a Pandas-like DataFrame library; cuML , a collection of machine learning libraries that provide GPU versions of most algorithms in Scikit-learn; and cuGraph, a NetworkX-like accelerated graph analysis library
Address: https://github.com/rapidsai/cudf
22. PostHog
PostHog is built for developers Open source product analysis platform. Automatically collect every event on your website or app without sending data to third parties. It provides event-based analytics at the user level, capturing usage data for your product to see which users performed which actions in your application. It automatically captures clicks and pageviews to analyze what your users are doing without having to manually push events. Address: https://github.com/PostHog/posthog
23. LakeFS
LakeFS provides a way to manage "Manage your data lake as code" approach adds a layer of Git-like version control to object storage. This application of Git semantics allows users to create their own isolated, zero-copy branches of data on which to work, experiment, and model analysis without the risk of corrupting shared objects. LakeFS brings useful commit notes, metadata fields, and rollback options to your data, as well as validation hooks to maintain data integrity and quality—before an uncommitted branch is accidentally merged back into production. , run format and schema checks. With LakeFS, familiar techniques for managing and securing code bases can be extended to modern databases such as Amazon S3 and Azure Blob Storage. Address: https://github.com/treeverse/lakeFS
24, Meltano
Meltano was separated from GitLab this year , a free and open source DataOps tool chain that replaces traditional ELT (Extract, Load, Transform). Meltano's data warehousing framework makes it easy to model, extract and transform data for your projects, and complements integration and transformation pipelines with built-in analytics tools and dashboards that simplify reporting. Providing a reliable extractor and loader library, as well as support for Singer's standard data extracting taps and data loading targets, Meltano is already a data orchestration powerhouse. 25, Trino
##Trino (formerly known as PrestoSQL) is a distributed SQL analysis engine that can Run extremely fast queries against large distributed data sources. Trino allows you to execute queries against a data lake, relational store, or multiple disparate sources simultaneously without copying or moving data for processing.And Trino works well with any business intelligence and analytics tools your data scientists might use, whether interactive or ad-hoc, minimizing the learning curve. As data engineers strive to support complex analytics across an increasing number of data sources, Trino provides a way to optimize query execution and accelerate results from disparate sources. Address: https://github.com/trinodb/trino
26, StreamNative
StreamNative is a highly scalable messaging and event streaming platform that greatly simplifies the flow of data to real-time reporting and analytics tools and enterprise applications Pipe laying. StreamNative combines Apache Pulsar's powerful distributed stream processing architecture with enterprise extras like Kubernetes and hybrid cloud support, a large library of data connectors, easy authentication and authorization, and specialized tools for health and performance monitoring to simplify Pulsar-based It simplifies the development of real-time applications and simplifies the deployment and management of large-scale messaging backplanes. Address: https://github.com/streamnative
27. Hugging Face
Hugging Face provides the most important The open source deep learning resource library is not a deep learning framework itself. The goal of Hugging Face is to extend beyond text to support images, audio, video, object detection, etc. Infoworld notes that deep learning practitioners should pay close attention to this repo in the coming years. Address: https://github.com/huggingface/transformers
28, EleutherAI
EleutherAI is a machine learning researcher A distributed group formed to bring GPT-3 to everyone. At the beginning of 2021, EleutherAI released The Pile, an 825 GB diverse text data set for training; and in June it announced GPT-J, a 6 billion parameter model, roughly equivalent to OpenAI's GPT-3 Curie variant. With the advent of GPT-NeoX, EleutherAI plans to increase the parameters all the way to 175 billion to compete with the most widespread GPT-3 model currently. Address: https://github.com/EleutherAI/gpt-neo
##29, Colab notebooks for generative art
First up is OpenAI’s CLIP (Contrastive Language-Image Pre-trained) model, a multi-modal model for generating text and image vector embeddings. While CLIP is fully open source, OpenAI’s generative neural network DALL-E is not. To fill this gap, Ryan Murdoch and Katherine Crowson developed Colab notebooks, which combine CLIP with other open source models such as BigGAN and VQGAN to produce prompt-based generative art. Licensed under the MIT license, these notebooks have been widely distributed on the Internet over the past few decades, remixed, altered, translated, and used to generate stunning works of art.
The above is the detailed content of The best open source software list of 2021. For more information, please follow other related articles on the PHP Chinese website!