What are the lesser-known techniques and uses of Java threads?
Everyone has their own preferences for radish and cabbage. Like I like Java. There is no end to learning, which is one of the reasons I love it. The tools you use in your daily work usually have something you have never understood before, such as a certain method or some interesting uses. For example, threads. That's right, it's a thread. Or rather the Thread class. When we build highly scalable systems, we usually face a variety of concurrent programming problems, but what we are going to talk about now may be slightly different.
From this article you will see some less commonly used methods and techniques provided by threads. Whether you are a beginner, an advanced user, or a Java expert, I hope you can take a look at what you already know and what you are just learning about. If you think there is anything else worth sharing about the thread, I hope you can respond positively below. So let's get started.
Beginner
1. Thread name
Each thread in the program has a name. When a thread is created, it will be assigned a simple Java string as the thread name. The default names are "Thread-0", "Thread-1", "Thread-2", etc. Now here comes the interesting thing - Thread provides two ways to set the thread name:
Thread constructor, the following is the simplest implementation:
class SuchThread extends Thread { Public void run() { System.out.println ("Hi Mom! " + getName()); } } SuchThread wow = new SuchThread("much-name");
Thread name setter method:
wow.setName(“Just another thread name”);
Yes, Thread names are mutable. Therefore, we can modify its name at runtime instead of specifying it during initialization. The name field is actually a simple string object. In other words, it can be as long as 2³¹-1 characters (Integer.MAX_VALUE). This is enough. Note that this name is not a unique identifier, so different threads can have the same thread name. Another point is, don't use null as the thread name, otherwise an exception will be thrown (of course, "null" is still OK).
Use thread names to debug problems
Since you can set the thread name, if you follow certain naming rules, it will be easier to troubleshoot when a problem occurs. A name like "Thread-6" seems so heartless, there must be a better name. When processing user requests, you can append the transaction ID to the thread name, which can significantly reduce your time to troubleshoot problems.
“pool-1-thread-1″ #17 prio=5 os_prio=31 tid=0x00007f9d620c9800 nid=0x6d03 in Object.wait() [0x000000013ebcc000]
“pool-1-thread-1″, this is too serious. Let’s take a look at what this is and give it a better name:
Thread.currentThread().setName(Context + TID + Params + current Time, ...);
Now let’s run jstack again, and the situation will suddenly become clear:
”Queue Processing Thread, MessageID: AB5CAD, type: AnalyzeGraph, queue: ACTIVE_PROD, Transaction_ID: 5678956, Start Time: 30/12/2014 17:37″ #17 prio=5 os_prio=31 tid=0x00007f9d620c9800 nid=0x6d03 in Object.wait() [0x000000013ebcc000]
If we can know what the thread is doing, In this way, when something goes wrong, you can at least get the transaction ID to start troubleshooting. You can trace back the problem, reproduce it, locate the problem and fix it. If you want to know what powerful uses jstack has, you can read this article.
2. Thread priority
Another interesting attribute of a thread is its priority. The priority of the thread ranges from 1 (MINPRIORITY) to 10 (MAXPRIORITY), and the main thread defaults to 5 (NORM_PRIORITY). Each new thread inherits the priority of the parent thread by default, so if you have not set it, all threads will have priority 5. This is a property that is usually ignored. We can obtain and modify its value through the getPriority() and setPriority() methods. There is no such function in the thread constructor.
Where is priority used?
Of course, not all threads are equal. Some threads require immediate attention from the CPU, while some threads are just background tasks. Priority is used to tell these to the thread scheduler of the operating system. In Takipi, which is an error tracking and troubleshooting tool we developed, the priority of the thread responsible for handling user exceptions is MAX_PRIORITY, and those threads that are only reporting new deployment situations have a lower priority. You may think that threads with higher priority will get more time from the JVM's thread scheduler. But this is not always the case.
在操作系统层面,每一个新线程都会对应一个本地线程,你所设置的Java线程的优先级会被转化成本地线程的优先级,这个在各个平台上是不一样的。在Linux上,你可以打开“-XX:+UseThreadPriorities”选项来启用这项功能。正如前面所说的,线程优先级只是你所提供的一个建议。和Linux本地的优先级相比,Java线程的优先级并不能覆盖全所有的级别(Linux共有1到99个优先级,线程的优先级在是-20到20之间)。最大的好处就是你所设定的优先级能在每个线程获得的CPU时间上有所体现,不过完全依赖于线程优先级的做法是不推荐的。
进阶篇
3.线程本地存储
这个和前面提到的两个略有不同。ThreadLocal是在Thread类之外实现的一个功能(java.lang.ThreadLocal),但它会为每个线程分别存储一份唯一的数据。正如它的名字所说的,它为线程提供了本地存储,也就是说你所创建出来变量对每个线程实例来说都是唯一的。和线程名,线程优先级类似,你可以自定义出一些属性,就好像它们是存储在Thread线程内部一样,是不是觉得酷?不过先别高兴得太早了,有几句丑话得先说在前头。
创建ThreadLocal有两种推荐方式:要么是静态变量,要么是单例实例中的属性,这样可以是非静态的。注意,它的作用域是全局的,只不过对访问它的线程而言好像是本地的而已。在下面这个例子中,ThreadLocal里面存储了一个数据结构,这样我们可以很容易地访问到它:
public static class CriticalData { public int transactionId; public int username; } public static final ThreadLocal<CriticalData> globalData = new ThreadLocal<CriticalData>();
一旦获取到了ThreadLocal对象,就可以通过 globalData.set()和globalData.get()方法来对它进行操作了。
全局变量?这不是什么好事
也尽然。ThreadLocal可以用来存储事务ID。如果代码中出现未捕获异常的时候它就相当有用了。最佳实践是设置一个UncaughtExceptionHandler,这个是Thread类本身就支持的,但是你得自己去实现一下这个接口。一旦执行到了UncaughtExceptionHandler里,就几乎没有任何线索能够知道到底发生了什么事情了。这会儿你能获取到的就只有Thread对象,之前导致异常发生的所有变量都无法再访问了,因为那些栈帧都已经被弹出了。一旦到了UncaughtExceptionHandler里,这个线程就只剩下最后一口气了,唯一能抓住的最后一根稻草就是ThreadLocal。
我们来试下这么做:
System.err.println("Transaction ID " + globalData.get().transactionId);
我们可以将一些与错误相关的有价值的上下文信息给存储到里面添。ThreadLocal还有一个更有创意的用法,就是用它来分配一块特定的内存,这样工作线程可以把它当作缓存来不停地使用。当然了,这有没有用得看你在CPU和内存之间是怎么权衡的了。没错,ThreadLocal需要注意的就是会造成内存空间的浪费。只要线程还活着,那么它就会一直存在,除非你主动释放否则它是不会被回收的。因此如果使用它的话你最好注意一下,尽量保持简单。
4. 用户线程及守护线程
我们再回到Thread类。程序中的每个线程都会有一个状态,要么是用户状态,要么是守护状态。换句话说,要么是前台线程要么是后台线程。主线程默认是用户线程,每个新线程都会从创建它的线程中继承线程状态。因此如果你把一个线程设置成守护线程,那么它所创建的所有线程都会被标记成守护线程。如果程序中的所有线程都是守护线程的话,那么这个进程便会终止。我们可以通过Boolean .setDaemon(true)和.isDaemon()方法来查看及设置线程状态。
什么时候会用到守护线程?
如果进程不必等到某个线程结束才能终止,那么这个线程就可以设置成守护线程。这省掉了正常关闭线程的那些麻烦事,可以立即将线程结束掉。换个角度来说,如果一个正在执行某个操作的线程必须要正确地关闭掉否则就会出现不好的后果的话,那么这个线程就应该是用户线程。通常都是些关键的事务,比方说,数据库录入或者更新,这些操作都是不能中断的。
专家级
5. 处理器亲和性(Processor Affinity)
这里要讲的会更靠近硬件,也就是说,当软件遇上了硬件。处理器亲和性使得你能够将线程或者进程绑定到特定的CPU核上。这意味着只要是某个特定的线程,它就肯定只会在某个特定的CPU核上执行。通常来讲如何绑定是由操作系统的线程调度器根据它自己的逻辑来决定的,它很可能会将我们前面提到的线程优先级也一并考虑进来。
这么做的好处在于CPU缓存。如果某个线程只会在某个核上运行,那么它的数据恰好在缓存里的概率就大大提高了。如果数据正好就在CPU缓存里,那么就没有必要重新再从内存里加载了。你所节省的这几毫秒时间就能用在刀刃上,在这段时间里代码可以马上开始执行,也就能更好地利用所分配给它的CPU时间。当然了,操作系统层面可能会存在某种优化,硬件架构当然也是个很重要的因素,但利用了处理器的亲和性至少能够减小线程切换CPU的机率。
由于这里掺杂着多种因素,处理器亲和性到底对吞吐量有多大的影响,最好还是通过测试的方式来进行证明。也许这个方法并不是总能显著地提升性能,但至少有一个好处就是吞吐量会相对稳定。亲和策略可以细化到非常细的粒度上,这取决于你具体想要什么。高频交易行业便是这一策略最能大显身手的场景之一。
处理器亲和性的测试
Java对处理器的亲和性并没有原生的支持,当然了,故事也还没有就此结束。在Linux上,我们可以通过taskset命令来设置进程的亲和性。假设我们现在有一个Java进程在运行,而我们希望将它绑定到某个特定的CPU上:
taskset -c 1 “java AboutToBePinned”
如果是一个已经在运行了的进程:
taskset -c 1 <PID>
要想深入到线程级别还得再加些代码才行。所幸的是,有一个开源库能完成这样的功能:Java-Thread-Affinity。这个库是由OpenHFT的Peter Lawrey开发的,实现这一功能最简单直接的方式应该就是使用这个库了。我们通过一个例子来快速看下如何绑定某个线程,关于该库的更多细节请参考它在Github上的文档:
AffinityLock al = AffinityLock.acquireLock();
这样就可以了。关于获取锁的一些更高级的选项——比如说根据不同的策略来选择CPU——在Github上都有详细的说明。