Canonical Launches Data Science Stack for ML Beginners-LINUX-php.cn

Home

System Tutorial

LINUX

Canonical Launches Data Science Stack for ML Beginners

Jennifer Aniston

Mar 17, 2025 am 10:22 AM

Data Science is the study of data. It involves collecting, analyzing, and interpreting large amounts of information. Data scientists use this information to make decisions, solve problems, and predict future trends.

Data scientists use various tools and techniques to analyze and interpret complex data sets. This helps businesses and organizations make better decisions.

If you're a beginner just starting with data science, you will probably face several challenges in setting up a proper data science environment.

Here are some reasons why setting up a data science environment can be challenging for beginners:

Software Installation: Newbies often struggle with installing the necessary software, such as programming languages (like Python or R), libraries, and tools (like Jupyter Notebooks or RStudio).
Understanding Dependencies: Software often requires specific versions of other software to work correctly. This can be confusing and lead to errors if not managed properly.
Learning Curve: Data science involves learning new skills, including programming, statistics, and machine learning. This can be overwhelming for beginners.
Data Handling: Working with data can be complex, especially when dealing with large or messy datasets. Understanding how to clean, store, and process data is crucial but can be difficult to grasp initially.
Version Control: Keeping track of changes in code and data is important but can be tricky to set up and manage, especially for those new to version control systems like Git.
Choosing the Right Tools: There are many tools and frameworks available, and choosing the right ones for a specific project can be daunting for beginners.

By understanding these challenges, beginners can better prepare themselves and seek the right resources and support to overcome them.

The initial hurdles can be challenging for new data scientists, but with persistence and consistent learning, the journey will become smoother.

Thanks to Canonical's Data Science Stack (DSS), setting up data science became much easier now. In this tutorial, we will discuss what is Data Science Stack and how to use it to setup a data science environment easily and quickly in Ubuntu operating systems.

Table of Contents

What is Data Science Stack (DSS)?

The Data Science Stack (DSS) by Canonical is an out-of-the-box solution for data scientists and machine learning engineers.

The Data Science Stack simplifies the setup process by providing a pre-configured environment that includes all the necessary tools and libraries for machine learning and data analysis.

By being designed to run on Ubuntu workstations and optimizing the use of GPUs, DSS can enhance the performance of machine learning models, which is particularly beneficial for computationally intensive tasks.

DSS allows users to focus more on the development and optimization of their models rather than the technicalities of the environment setup.

This can save a significant amount of time that would otherwise be spent on installing and configuring individual components.

What's Included in the Data Science Stack?

The Data Science Stack (DSS) provides a comprehensive and integrated environment for data scientists and machine learning engineers. Here's what it offers:

Pre-installed Tools: DSS includes popular open-source tools like MicroK8s, JupyterLab and MLflow, which are essential for data exploration, model development, and experiment tracking.
Machine Learning Frameworks: By default, it comes with two widely used machine learning frameworks, PyTorch and TensorFlow, which are ready to use for building and training models.
Command Line Interface (CLI): DSS provides an intuitive CLI for deploying these tools and frameworks, making it easier to manage and scale the environment.
User Interfaces: After deployment, users can access the UIs of the tools to start working on their data science projects without the hassle of manual setup.
Packaging Dependencies: DSS handles the packaging dependencies, ensuring that all tools, libraries, and frameworks are compatible with each other and work smoothly together.
Hardware Compatibility: It is designed to be compatible with the machine's hardware, optimizing the performance of the tools and frameworks
Simplified Configuration: Traditionally, setting up machine learning environments on workstations can be complex and difficult to reverse. DSS addresses this by providing accessible, production-ready, isolated, and reproducible ML environments that efficiently utilize a workstation's GPUs.
GPU Configuration: DSS simplifies GPU configuration by including the GPU operator, which manages the setup and usage of GPUs for machine learning tasks, leveraging their computational power effectively.

Overall, DSS aims to provide a hassle-free and optimized environment for data science and machine learning, allowing users to focus on their core tasks rather than the technical setup and maintenance of their tools.

Install Data Science Stack (DSS) in Ubuntu

To begin using the Data Science Stack (DSS) for machine learning and data science, follow these steps to set up your environment:

Prerequisites

Operating System: Ensure you have Ubuntu 22.04 LTS or Ubuntu 24.04 LTS installed on your system.
Internet Connection: You'll need an active internet connection to download and install the necessary software.
Snap: Make sure Snap is installed on your system, as it is required for installing MicroK8s and DSS.

Setting Up MicroK8s

DSS uses MicroK8s as its container orchestration system, which allows workloads to access the host's GPUs.

To Install MicroK8s on Ubuntu, run:

$ sudo snap install microk8s --channel 1.28/stable --classic

Next, enable the required services:

$ sudo microk8s enable storage dns rbac

Installing the DSS CLI

The Data Science Stack is managed through a Command Line Interface (CLI).

Install DSS CLI with the following command:

$ sudo snap install data-science-stack --channel latest/stable

With these steps completed, you'll have the foundational components of DSS installed and ready to use. You can now proceed to set up your machine learning environments and start running your first notebooks using the DSS CLI.

Getting Started with Data Science Stack

After installing MicroK8s and the DSS CLI, the next step is to initialize DSS on top of MicroK8s and prepare MLflow for use.

Initializing DSS and MLflow

To initialize DSS, you'll need to use thedss initializecommand, which sets up the necessary resources within the MicroK8s cluster.

$ dss initialize --kubeconfig="$(sudo microk8s config)"

The--kubeconfigflag is used to specify the path to the Kubernetes configuration file generated by MicroK8s.

The dss initialize command may take a few minutes to complete. During this time, the DSS CLI will display messages indicating the progress of the deployment. You will see messages similar to the following:

[INFO] Waiting for deployment my-tensorflow-notebook in namespace dss to be ready...

This message indicates that DSS is waiting for the deployment of the TensorFlow notebook to be ready. Be patient as the system sets up the environment and ensures all components are correctly configured.

Once the initialization is complete, you will see an output like below:

[INFO] Executing initialize command
[INFO] Storing provided kubeconfig to /home/ostechnix/snap/data-science-stack/16/.dss/config
[INFO] Waiting for deployment mlflow in namespace dss to be ready...
[INFO] Deployment mlflow in namespace dss is ready
[INFO] DSS initialized. To create your first notebook run the command:

dss create

Examples:
  dss create my-notebook --image=pytorch
  dss create my-notebook --image=kubeflownotebookswg/jupyter-scipy:v1.8.0

Canonical Launches Data Science Stack for ML Beginners

Now, you will be ready to start using the MLflow tracking server and other components provided by DSS.

You can then proceed to create and run your first machine learning notebook within the DSS environment.

Starting Your First Jupyter Notebook

To start your first Jupyter Notebook using the Data Science Stack (DSS), you'll need to use thedss createcommand, which allows you to specify the type of notebook you want to create.

Here, we are creating a TensorFlow notebook named my-tensorflow-notebook with CUDA support:

$ dss create my-tensorflow-notebook --image=kubeflownotebookswg/jupyter-tensorflow-cuda:v1.8.0

Upon successful creation of the Notebook, you will see an output like below:

[INFO] Executing create command
[INFO] Waiting for deployment my-tensorflow-notebook in namespace dss to be ready...
[INFO] Waiting for deployment my-tensorflow-notebook in namespace dss to be ready...
[INFO] Waiting for deployment my-tensorflow-notebook in namespace dss to be ready...
[INFO] Deployment my-tensorflow-notebook in namespace dss is ready
[INFO] Success: Notebook my-tensorflow-notebook created successfully.
[INFO] Access the notebook at http://10.152.183.253:80.

Canonical Launches Data Science Stack for ML Beginners

Once the notebook is ready, the command shows a URL that you can use to access the JupyterLab UI.

To start working with your notebook, open a web browser and enter the provided URL into the address bar.

As you see in the above output, we can access the newly created Notebook at http://10.152.183.253:80 from a Web browser. Replace the URL with your own.

This will take you to the JupyterLab interface where you can create new notebooks, upload data, and begin your machine learning tasks using TensorFlow and CUDA.

Canonical Launches Data Science Stack for ML Beginners

Remember that the IP address and port number in the URL may vary depending on your specific setup.

That's it. You can now start interact with your Notebook.

View DSS Status

To quickly check the status of your Data Science Stack (DSS) environment, including the status of MLflow and the availability of GPU acceleration, you can use thedss statuscommand like below.

$ dss status

Thedss statuscommand will provide you with a summary of the current state of your DSS environment. Here's an example of what the output might look like:

[INFO] MLflow deployment: Ready
[INFO] MLflow URL: http://10.152.183.157:5000
[INFO] GPU acceleration: Disabled

Explanation of Output:

MLflow deployment: Ready indicates that the MLflow tracking server is up and running.
MLflow URL provides the URL where you can access the MLflow UI to track your machine learning experiments.
GPU acceleration: Disabled shows that there is no GPU available or configured for use in the current DSS environment.

To verify, open the MLflow URL http://10.152.183.157:5000 from your web browser.

This will open the MLflow dashboard in your web browser.

Experiments tab in the MLflow dashboard:

Canonical Launches Data Science Stack for ML Beginners

Since it is our new installation, there are no experiments yet. To create an experiment use the mlflow experiments CLI.

Models tab in MLflow Dashboard:

Canonical Launches Data Science Stack for ML Beginners

Listing DSS Commands

To view the list of available commands for the Data Science Stack (DSS), you can use the dss command with the --help option.

Run the following command in your terminal:

$ dss --help

This will display a list of commands along with a brief description of their purpose.

If you need more detailed information about a specific DSS command, you can use the command followed by the --help option.

For example, to get details about the initialize command, you would run:

$ dss logs --help

Removing Data Science Stack from MicroK8s

If you don't need DSS anymore, you can use the dss purge command to remove the Data Science Stack from your MicroK8s cluster.

To remove DSS, execute the following command in your terminal:

$ dss purge

This command will completely remove all DSS components, including Jupyter Notebooks, the MLflow server, and any data stored within the DSS environment.

It's important to note that this action is irreversible, and all data within the DSS environment will be permanently lost. Make sure to back up any important data before proceeding with the purge.

Remove DSS CLI and MicroK8s

While the dss purge command removes the DSS components from the MicroK8s cluster, it does not remove the DSS CLI or the MicroK8s cluster itself. If you wish to remove these as well, you will need to delete their respective snaps:

To remove the DSS CLI, use the following command:

$ sudo snap remove data-science-stack

To remove MicroK8s, use the following command:

$ sudo snap remove microk8s

By following these steps, you can completely remove the Data Science Stack (DSS) and its associated components from your system.

Frequently Asked Questions (FAQ)

Q: What is Data Science Stack (DSS)?

A: Data Science Stack (DSS) is a comprehensive, ready-to-run environment for machine learning and data science. It is designed to simplify the setup and management of data science tools and frameworks, allowing users to focus on their core tasks rather than the intricacies of environment configuration.

Q: What tools are included in DSS?

A: DSS includes a variety of open-source tools such as Jupyter Notebook, MLflow, and popular machine learning frameworks like TensorFlow and PyTorch. It also provides a container orchestration system, MicroK8s, for managing workloads.

Q: How do I install DSS?

A: To install DSS, you need to have Ubuntu 22.04 LTS or Ubuntu 24.04 LTS, an internet connection, and Snap installed. Then, you can install MicroK8s and the DSS CLI using Snap commands. For detailed instructions, refer to the official documentation or installation guide.

Q: How do I start a Jupyter Notebook with DSS?

A: You can start a Jupyter Notebook with DSS using the dss create command, specifying the desired image for your notebook. For example, to start a TensorFlow notebook, you would use dss create my-tensorflow-notebook --image=kubeflownotebookswg/jupyter-tensorflow-cuda:v1.8.0.

Q: What is the purpose of the dss status command?

A: The dss status command provides a quick overview of the current state of your DSS environment, including the status of MLflow and the availability of GPU acceleration. It helps you verify that all components are functioning correctly.

Q: How do I remove DSS from my environment?

A: To remove DSS, you can use the dss purge command, which will remove all DSS components, including Jupyter Notebooks and the MLflow server. Note that this action is irreversible and will result in the loss of all data within the DSS environment.

Q: Where can I find more information about DSS commands?

A: You can find detailed information about DSS commands by using the dss --help command to list all available commands and dss --help to get detailed usage for a specific command.

Q: Is DSS free to use?

Yes, DSS is based on open-source tools and is free to use.

Q: Is DSS suitable for beginners in data science?

A: Yes, DSS is designed to be user-friendly and can be a great tool for beginners as it reduces the complexity of setting up a data science environment. It provides a ready-made and optimized environment that allows users to start working on data science projects quickly.

Conclusion

In summary, the Data Science Stack (DSS) simplifies the setup for data science tasks. It provides a collection of tools that work well together, making it easier to start projects quickly.

Whether you're new to data science or experienced, DSS helps you focus on your work by handling the technical setup. It's a reliable tool that supports efficient data analysis and model building.

Resource:

Data Science Stack (DSS) Documentation

Related Read:

How To Install Anaconda On Linux
How To Install Miniconda In Linux

The above is the detailed content of Canonical Launches Data Science Stack for ML Beginners. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How does hardware compatibility differ between Linux and Windows?Apr 23, 2025 am 12:15 AM

Linux and Windows differ in hardware compatibility: Windows has extensive driver support, and Linux depends on the community and vendors. To solve Linux compatibility problems, you can manually compile drivers, such as cloning RTL8188EU driver repository, compiling and installing; Windows users need to manage drivers to optimize performance.

What are the differences in virtualization support between Linux and Windows?Apr 22, 2025 pm 06:09 PM

The main differences between Linux and Windows in virtualization support are: 1) Linux provides KVM and Xen, with outstanding performance and flexibility, suitable for high customization environments; 2) Windows supports virtualization through Hyper-V, with a friendly interface, and is closely integrated with the Microsoft ecosystem, suitable for enterprises that rely on Microsoft software.

What are the main tasks of a Linux system administrator?Apr 19, 2025 am 12:23 AM

The main tasks of Linux system administrators include system monitoring and performance tuning, user management, software package management, security management and backup, troubleshooting and resolution, performance optimization and best practices. 1. Use top, htop and other tools to monitor system performance and tune it. 2. Manage user accounts and permissions through useradd commands and other commands. 3. Use apt and yum to manage software packages to ensure system updates and security. 4. Configure a firewall, monitor logs, and perform data backup to ensure system security. 5. Troubleshoot and resolve through log analysis and tool use. 6. Optimize kernel parameters and application configuration, and follow best practices to improve system performance and stability.

Is it hard to learn Linux?Apr 18, 2025 am 12:23 AM

Learning Linux is not difficult. 1.Linux is an open source operating system based on Unix and is widely used in servers, embedded systems and personal computers. 2. Understanding file system and permission management is the key. The file system is hierarchical, and permissions include reading, writing and execution. 3. Package management systems such as apt and dnf make software management convenient. 4. Process management is implemented through ps and top commands. 5. Start learning from basic commands such as mkdir, cd, touch and nano, and then try advanced usage such as shell scripts and text processing. 6. Common errors such as permission problems can be solved through sudo and chmod. 7. Performance optimization suggestions include using htop to monitor resources, cleaning unnecessary files, and using sy

What is the salary of Linux administrator?Apr 17, 2025 am 12:24 AM

The average annual salary of Linux administrators is $75,000 to $95,000 in the United States and €40,000 to €60,000 in Europe. To increase salary, you can: 1. Continuously learn new technologies, such as cloud computing and container technology; 2. Accumulate project experience and establish Portfolio; 3. Establish a professional network and expand your network.

What is the main purpose of Linux?Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

Does the internet run on Linux?Apr 14, 2025 am 12:03 AM

The Internet does not rely on a single operating system, but Linux plays an important role in it. Linux is widely used in servers and network devices and is popular for its stability, security and scalability.

What are Linux operations?Apr 13, 2025 am 12:20 AM

The core of the Linux operating system is its command line interface, which can perform various operations through the command line. 1. File and directory operations use ls, cd, mkdir, rm and other commands to manage files and directories. 2. User and permission management ensures system security and resource allocation through useradd, passwd, chmod and other commands. 3. Process management uses ps, kill and other commands to monitor and control system processes. 4. Network operations include ping, ifconfig, ssh and other commands to configure and manage network connections. 5. System monitoring and maintenance use commands such as top, df, du to understand the system's operating status and resource usage.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

4 weeks agoByDDD

Atomfall guide: item locations, quest guides, and tips

4 weeks agoByDDD

Hot Tools

Atom editor mac version download

The most popular open source editor

Dreamweaver Mac version

Visual web development tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Hot Topics

Where is the login entrance for gmail email?

7652

CakePHP Tutorial

1393

What is the format of the account name of steam

win11 activation key permanent

nyt mini crossword answers

110