Home >Operation and Maintenance >Linux Operation and Maintenance >How are Linux threads created?

How are Linux threads created?

王林forward: 2023-05-22 18:38:121977browse

The concept and implementation of threads

A thread is an execution sequence or execution path within a process. A process can contain multiple threads.

#From the perspective of resource allocation, the process is the basic unit of resource allocation by the operating system.
From the perspective of resource scheduling, thread is the smallest unit of resource scheduling and the smallest unit of program execution

Execution A sequence is a set of ordered instructions - a function.

A thread is an execution sequence within a process. A process has at least one thread, which is called the main thread (the execution sequence represented by the main method). Other threads can be created through the thread library ( Specify a function for the thread to execute), and call the created thread a function thread.

How are Linux threads created?

How threads are implemented

Kernel-level threads (threads are created and managed directly by the kernel. Although the creation overhead is large, but Can utilize multi-processor resources)
User-level threads (multiple threads are created and managed by the thread library. Threads are implemented in user mode and cannot be sensed by the kernel. The creation overhead is relatively high. Small, unable to use the resources of multi-processors)
Hybrid-level threads (implemented by combining the above two methods, you can use the resources of multi-processors to create more threads in user space Threads, thus mapped to threads in kernel space, many-to-many, N: M (N>>M))

How are Linux threads created?

##Linux system implementation Multi-threading approach

Linux’s mechanism for implementing threads is very unique.

From the perspective of the kernel, it does not have the concept of threads.

Linux implements all threads as processes. The kernel does not prepare special scheduling algorithms or define special data structures to represent threads.

Instead, a thread is simply considered a process that shares certain resources with other processes.

Each thread has its own unique task_struct, so in the kernel, it looks like an ordinary process (only the thread shares certain resources, such as address space, with other processes)

The difference between threads and processes

Process is the smallest unit of resource allocation, and thread is the smallest unit of program execution;
Thread The switching efficiency between processes is higher than that between processes.
The process has its own independent address space. Every time a process is started, the system will allocate the address space for it and establish a data table to maintain it. For code segment, stack segment and data segment, threads do not have independent address spaces. They use the same address space to share data;
Creating a thread is less expensive than a process;
Threads occupy much fewer resources than processes.
Communication between threads is more convenient. Under the same process, threads share global variables, static variables and other data. Communication between processes needs to be carried out through communication (IPC); ( However, it is difficult for multi-threaded programs to handle synchronization and mutual exclusion well)
Multi-process programs are safer and more vital. The death of one process will not affect the other process (from (with independent address space), multi-threaded programs are more difficult to maintain. If one thread dies, the entire process dies (because of the shared address space);
The process has high requirements for resource protection , high overhead, relatively low efficiency, low thread resource protection requirements, but low overhead, high efficiency, and can be switched frequently;

Three basic concepts of multi-thread development

Thread [create, exit, wait]
Mutex lock [create, destroy, lock], unlock]
Conditions [Create, destroy, trigger, broadcast, wait]

Usage of thread library

1. Create thread

#include<phread.h>

int pthread_create(pthread_t *id , pthread_attr_t *attr, void(*fun)(void*), void *arg);

id: Pass the address of a pthread_t type variable. After the creation is successful, it is used to obtain the TID of the newly created thread
attr: The attributes of the specified thread use NULL by default
fun: The address of the thread function
arg: Parameters passed to the thread function
Return value , 0 is returned on success, error code ## is returned on failure

#Multi-threaded code example

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include<string.h>
#include<unistd.h>

#include<pthread.h>

//声明一个线程函数
void *fun(void *);

int main()
{
	printf("main start\n");

	pthread_t id;
	//创建函数线程，并且指定函数线程要执行的函数
	int res = pthread_create(&id,NULL,fun,NULL);
	assert(res == 0);

	//之后并发运行
	int i = 0;	
	for(; i < 5; i++)
	{
		printf("main running\n");
		sleep(1);
	}

	printf("main over\n");
	exit(0);
}

//定义线程函数
void* fun(void *arg)
{
	printf("fun start\n");

	int i = 0;
	for(; i < 3;i++)
	{
		printf("fun running\n");
		sleep(1);
	}

	printf("fun over\n");
}

gcc compiles the code and reports an `undified reference to xxxxx error. This is because some methods are called in the program, but there are no Connect the file where the method is located, such as the following situation:

How are Linux threads created? #The connection library file is compiled successfully and executed. This is also prompted in the help manual:

Compile and link with -pthread

<img src="https://img.php.cn/upload/article/000/465/014/168475189431918.png" alt="How are Linux threads created?"> 比较两次运行的结果发现前三条执行语句时一样的 <img src="https://img.php.cn/upload/article/000/465/014/168475189410716.png" alt="How are Linux threads created?"> 结论 <ul class=" list-paddingleft-2"> <li>创建线程并执行线程函数，和调用函数是完全不同的概念。</li> <li>主线程和函数线程是并发执行的。</li> <li>线程提前于主线程结束时，不会影响主线程的运行</li> <li>主线程提前于线程结束时，整个进程都会结束，其他线程也会结束</li> <li>创建函数线程后，哪个线程先被执行是有操作系统的调度算法和机器环境决定。</li> </ul> <img src="https://img.php.cn/upload/article/000/465/014/168475189413184.png" alt="How are Linux threads created?"> 函数线程在主线程结束后也随之退出，原因：主线程结束时使用的是exit方法，这个方法结束的是进程。 然而修改代码为：<code>pthread_exit(NULL);此时主线程结束，函数线程会继续执行直至完成。即便如此，我们还是不推荐大家手动结束主线程，我们更喜欢让主线程等待一会。

给线程函数传参

①值传递

将变量的值直接转成void*类型进行传递

因为线程函数接受的是一个void*类型的指针，只要是指针，32位系统上都是4个字节，值传递就只能传递小于或等于4字节的值。

代码示例

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include<string.h>
#include<unistd.h>

#include<pthread.h>

void *fun(void *);

int main()
{
	printf("main start\n");

	int a = 10;
	
	pthread_t id;
	int res = pthread_create(&id,NULL,fun,(void*)a);
	assert(res == 0);

	int i = 0;	
	for(; i < 5; i++)
	{
		printf("main running\n");
		sleep(1);
	}

	printf("main over\n");
	exit(0);
}


void* fun(void *arg)
{
	int b = (int)arg;
	printf("b == %d\n",b);
}

How are Linux threads created?

②地址传递

将变量（所有类型）的地址强转成void*类型进行传递，就和在普通函数调用传递变量的地址相似。

主线程和函数线程通过这个地址就可以共享地址所指向的空间。

一个进程内的所有线程是共享这个进程的地址空间。

多线程下进程的4G虚拟地址空间

How are Linux threads created?

一个进程内的所有线程对于全局数据，静态数据，堆区空间都是共享的。

线程之间传递数据很简单，但是随之带来的问题就是线程并发运行时无法保证线程安全。

代码示例

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include<string.h>
#include<unistd.h>

#include<pthread.h>

int gdata = 10; //.data

void *fun(void *);

int main()
{
	int *ptr = (int *)malloc(4);//.heap
    *ptr = 10;
	
	pthread_t id;
	int res = pthread_create(&id,NULL,fun,(void*)ptr);
	assert(res == 0);

    sleep(2);//等待两秒，保证函数线程已经讲数据修改

	printf("main : gdata == %d\n",gdata);
    printf("main : *ptr = %d\n",*ptr);

	exit(0);
}


void *fun(void *arg)
{
	int *p = (int*)arg;

    gdata = 20000;
    *p = 20;

	printf("fun over\n");
}

How are Linux threads created?

线程库中的其他方法

线程退出的三种方式：

线程从执行函数返回，返回值是线程的退出码；
线程被同一进程的其他线程取消；
调用pthread_exit（）函数退出；

等待线程终止

int pthread_join(pthread_t thread, void **retval);
args:
    pthread_t thread: 被连接线程的线程号，该线程必须位于当前进程中，而且不得是分离线程
    void **retval :该参数不为NULL时，指向某个位置 在该函数返回时，将该位置设置为已终止线程的退出状态
    return:
    线程连接的状态，0是成功，非0是失败

当A线程调用线程B并 pthread_join() 时，A线程会处于阻塞状态，直到B线程结束后，A线程才会继续执行下去。当 pthread_join() 函数返回后，被调用线程才算真正意义上的结束，它的内存空间也会被释放（如果被调用线程是非分离的）。

这里有三点需要注意：

系统仅释放系统空间，你需要手动清除程序分配的空间，例如由 malloc() 分配的空间。
2.一个线程只能被一个线程所连接。
3.被连接的线程必须是非分离的，否则连接会出错。所以可以看出pthread_join()有两种作用：1-用于等待其他线程结束：当调用 pthread_join() 时，当前线程会处于阻塞状态，直到被调用的线程结束后，当前线程才会重新开始执行。2-对线程的资源进行回收：如果一个线程是非分离的（默认情况下创建的线程都是非分离）并且没有对该线程使用 pthread_join() 的话，该线程结束后并不会释放其内存空间，这会导致该线程变成了“僵尸线程”。

等待指定的子线程结束

等待thread（）指定的线程退出，线程未退出时，该方法阻塞
result接收thread线程退出时，指定退出信息

int pthread_join(pthread_t id,void **result)//调用这个方法的线程会阻塞，直到等待线程结束

代码演示：

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include<string.h>
#include<unistd.h>

#include<pthread.h>

int main()
{
	printf("main start\n");

	pthread_t id;
	int res = pthread_create(&id,NULL,fun,NULL);
	assert(res == 0);

	//之后并发运行
	int i = 0;	
	for(; i < 5; i++)
	{
		printf("main running\n");
		sleep(1);
	}
	
	char *s = NULL;
	pthread_join(id,(void **)&s);
	printf("join : s = %s\n",s);
	
	exit(0);
}

//定义线程函数
void* fun(void *arg)
{
	printf("fun start\n");

	int i = 0;
	for(; i < 10;i++)
	{
		printf("fun running\n");
		sleep(1);
	}

	printf("fun over\n");

	pthread_exit("fun over");//将该字符常量返回给主线程
}

此时，主线程完成五次输出，就会等待子线程结束，阻塞等待，子线程结束后，最后，主线程打印join：s = fun over

关于exit和join的一些详细说明：

线程自己运行结束，或者调用pthread_exit结束，线程都会释放自己独有的空间资源；
若线程是非分离的，线程会保留线程ID号，直到其他线程通过joining这个线程确认其已经死亡，join的结果是joining线程得到已终止线程的退出状态，已终止线程将消失；
若线程是分离的，不需要使用pthread_exit（），线程自己运行结束，线程结束就会自己释放所有空间资源（包括线程ID号）；
子线程最终一定要使用pthread_join（）或者设置为分离线程来结束线程，否则线程的资源不会被完全释放（使用取消线程功能也不能完全释放）；
主线程运行pthrea_exit()，会结束主线程，但是不会结束子线程；
主线程结束，则整个程序结束，所以主线程最好使用pthread_join函数等待子线程结束，使用该函数一个线程可以等待多个线程结束；
使用pthread_join函数的线程将会阻塞，直到被join的函数线程结束，该函数返回，但是它对被等待终止的线程运行没有影响；
如果子线程使用exit()则可以结束整个进程；

线程属性

线程具有的属性可以在线程创建的时候指定；

——pthread_create()函数的第二个参数(pthread_attr_t *attr)表示线程的属性，在以前的例子中将其值设为NULL,也就是采用默认属性，线程的多项属性都是可以修改的，这些属性包括绑定属性，分离属性，堆栈属性，堆栈大小，优先级。

系统默认的是非绑定，非分离，缺省1M的堆栈以及父子进程优先级相同

线程结构如下：

typedef struct
{
    int             detachstate;     //线程的分离状态
    int             schedpolicy;    //线程调度策略
    struct sched_param  schedparam; //线程的调度参数
    int             inheritsched;   //线程的继承性
    int             scope;      //线程的作用域
    size_t          guardsize;  //线程栈末尾的警戒缓冲区大小
    int             stackaddr_set; //线程的栈设置
    void*           stackaddr;  //线程栈的位置
    size_t          stacksize;  //线程栈的大小
} pthread_attr_t;

每一个属性都有对应的一些函数，用于对其进行查看和修改，下面分别介绍：

线程属性初始化

初始化和去初始化分别对应于如下的两个函数：

#include <pthread.h>

①int pthread_attr_init(pthread_attr_t *attr);
②it pthread_attr_destroy(pthread_attr_t *attr);

①功能：

初始化线程属性函数，注意：应先初始化线程属性，再pthread_create创建线程

参数：

attr：线程属性结构体

返回值：

成功：0
失败：-1

②功能：

销毁线程属性所占用的资源函数

参数：

attr：线程属性结构体

返回值：

成功：0
失败：-1

线程分离

线程的分离状态决定一个线程以什么样的方式来终止自己，这个在之前我们也说过了。

默认状态下，线程是非分离状态，意味着原有的线程会等待所创建的线程结束。只有在pthread_join()函数返回后，才能释放创建的线程占用的系统资源，也才能视作该线程终止。
若线程运行结束且无其他线程阻塞等待，则该线程处于分离状态，此时系统资源将立即被释放。应该根据自己的需要，选择适当的分离状态。

How are Linux threads created?

The concept and implementation of threads

How threads are implemented

线程库中的其他方法

线程属性

线程属性初始化

线程分离

Related articles