search
HomeDatabaseMysql Tutorial使用IDEA开发Spark应用

使用IDEA开发Spark应用

Jun 07, 2016 pm 04:38 PM
ideainsparkuseapplicationdevelop

IDEA 全称IntelliJ IDEA,是java语言开发的集成环境,IntelliJ在业界被公认为最好的java开发工具之一,尤其在智能代码助手、代码自动提示、重构、J2EE支持、Ant、JUnit、CVS整合、代码审查、 创新的GUI设计等方面的功能都非常棒,而且IDEA是目前Scala支持最

IDEA 全称IntelliJ IDEA,是java语言开发的集成环境,IntelliJ在业界被公认为最好的java开发工具之一,尤其在智能代码助手、代码自动提示、重构、J2EE支持、Ant、JUnit、CVS整合、代码审查、 创新的GUI设计等方面的功能都非常棒,而且IDEA是目前Scala支持最好的IDE。IDEA分ultimate和free edition版,ultimate提供了J2EE等很多非常强力的功能,free edition我觉得已经对于我这样的初学者已经够用了。前面写过一篇配置IntelliJ IDEA 13的SBT和Scala开发环境,本文在这个基础上使用IDEA进行Spark应用的配置和开发。

1. IDEA环境配置

(1). 首先在IntellJ/bin/idea64.exe.vmoptions(对应64位大内存系统),加大IDEA的启动内存:

-Xms512m
-Xmx1024m
-XX:MaxPermSize=512m

(2). 在IDEA中,Project相当于eclipse中的workspace,同一IDEA窗口只能打开一个workspace。而IDEA中的module等同于eclipse中的project,所以通过File – New Module来为当前Project创建一个module。
1
(3). IDEA会生成大量的缓存文件,来于保存配置信息、插件和项目索引文件等。,一般都会有代码的十倍大小左右大小。在Windows下目录为C:\Users\THINKP\.IntelliJIdea13,使用File – Invalidate Caches可以校验索引的有效性并在需要的时候重建。IDEA会经常读写这些缓存文件,所以使用SSD来存储缓存文件会提高不少性能。下面是修改缓存文件路径的方法:
a). 关闭IDEA
b). 将cache目录复制到对应的目录下面。
c). 打开IntelliJ IDEA 13.1.3\bin\idea.properties文件,例如将IDEA转移到目录D:\Program Files\.IntelliJIdea13中,只需要修改
idea.config.path=D:/Program Files/.IntelliJIdea13/config
idea.system.path=D:/Program Files/.IntelliJIdea13/system
(4). 主题和颜色
Settings – IDE Settings – Appearance – Theme:Darcula
然后把下面override font选项勾上,选择Yahei 14号字体。
然后重启IDEA,界面变成了灰黑色风格,瞬间顺眼了很多!
2
编辑器可以设置单独的主题,当前面设置了全局主题时,编辑器的主题也会被修改。接下来,编辑器界面字体有点小,可以在Editor – Colors&Fonts – Fonts另存为一个新的主题,并在这个新主题中修改配置。我的屏幕分辨率有点大,所以设置了15号字体。
3
光标所在行背景颜色
Editor – Colors&Fonts – General – Caret row,选择了蓝色背景,这样就有了较大的色差。
4
(5). 常用快捷键
界面中的Alt+1 project窗口
Alt+7 代码结构图
Alt+2 Favorite
F11打书签,再按一次取消。此时Favorite - Bookmark里就有这一项。
TODO list Alt+6
注释中以TODO开头时,该TODO项就可以在TODO标签页中找到。这样在有一些思路但是来不及做时,可以以TODO的形式写注释
5
同步项目(Detect all externally changed files and reload them from disk)Ctrl+Y
保存(Save all) Ctrl+S
undo Ctrl+Z
redo Ctrl+Shift+Y
剪切 Ctrl+X
复制 Ctrl+C
粘贴 Ctrl+V
查找 Ctrl+F
替换 Ctrl+R
光标的上一个位置(undo navigation) Ctrl+Alt+<br> 光标的下一个位置(redo navigation) <code>Ctrl+Alt+->
make Ctrl+F9
(6). 项目文件设定
行分割模式: File - Separators 选择Windows风格(/r/n), UNIX的风格(/n)或者mac风格(/r)等等。
将文件锁定编辑 - File - Make file read only
文件编码设置 Project Settings - File Encodings
推荐YouMeek IDEA教程,我认为是目前详细的IDEA教程之一。
http://www.youmeek.com/category/software-system/my-intellij-idea/

2. 使用IDEA开发Spark程序并运行

首先编辑build.sbt文件,每个配置项都要有一个空格来分割。

build.sbt
name := "sbtTest"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core_2.10" % "1.0.2"
libraryDependencies += "org.apache.spark" % "spark-bagel_2.10" % "1.0.2"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.2"
libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.0.2"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.0.2"

打开SBT,可以观察到SBT正在downloading dependencies。

...
[info] downloading http://repo1.maven.org/maven2/org/apache/spark/spark-bagel_2.10/1.0.2/spark-bagel_2.10-1.0.2.jar ...
[info] 	[SUCCESSFUL ] org.apache.spark#spark-bagel_2.10;1.0.2!spark-bagel_2.10.jar (5672ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/spark/spark-mllib_2.10/1.0.2/spark-mllib_2.10-1.0.2.jar ...
[info] 	[SUCCESSFUL ] org.apache.spark#spark-mllib_2.10;1.0.2!spark-mllib_2.10.jar (7351ms)
[info] downloading http://repo1.maven.org/maven2/org/apache/spark/spark-graphx_2.10/1.0.2/spark-graphx_2.10-1.0.2.jar ...
[info] 	[SUCCESSFUL ] org.apache.spark#spark-graphx_2.10;1.0.2!spark-graphx_2.10.jar (6349ms)
...
...

编写代码,这段代码用于处理web前端日志,其中第二列是session的ID,输出Session访问次数的排名。

/**
 * Created by Debugo on 2014/8/25.
 */
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.SparkContext._
object LogAnalyzer {
  def main(args:Array[String]): Unit ={
    if(args.length!=2) {
      System.err.println("Usage: LogAnalyzer  ")
      System.exit(1)
    }
    val conf = new SparkConf().setAppName("LogAnalyzer")
    val sc = new SparkContext(conf)
    // args(0)=file:///root/access_log/access_log.20080601.decode.filter
    // args(1)=file:///root/access_log/result
    sc.textFile(args(0)).map(_.split("\t| ")).filter(_.length==6).
      map(x=>(x(1),1)).reduceByKey(_+_).map(x=>(x._2,x._1)).
      sortByKey(false).map(x=>(x._2,x._1)).saveAsTextFile(args(1))
    sc.stop()
  }
}

在sbt命令行中中compile&package

> compile
[info] Compiling 1 Scala source to C:\Users\Administrator\IdeaProjects\Spark0\target\scala-2.10\classes...
[success] Total time: 5 s, completed 2014-8-25 16:05:20
>   package
[info] Packaging C:\Users\Administrator\IdeaProjects\Spark0\target\scala-2.10\spark0_2.10-1.0.jar ...
[info] Done packaging.
[success] Total time: 0 s, completed 2014-8-25 16:17:12

将jar上传到配置spark运行库的节点,提交job,spark会创建结果输出的result目录。最终RDD被分割成了5个分区。

spark-submit --master spark://debugo:7077 --class LogAnalyzer --executor-memory=10g /root/spark0_2.10-1.0.jar file:///root/access_log/access_log.20080601.decode.filter file:///root/access_log/result
...
$ ll /root/access_log/result
total 10840
-rw-r--r-- 1 root root 2708325 Aug 25 15:58 part-00000
-rw-r--r-- 1 root root 1114214 Aug 25 15:58 part-00001
-rw-r--r-- 1 root root 2239113 Aug 25 15:58 part-00002
-rw-r--r-- 1 root root       0 Aug 25 15:58 part-00003
-rw-r--r-- 1 root root 5028580 Aug 25 15:58 part-00004
-rw-r--r-- 1 root root       0 Aug 25 15:58 _SUCCESS
$ more part-00000
(11579135515147154,431)
(6383499980790535,385)
(7822241147182134,370)
(900755558064074,335)
(12385969593715146,226)
...

得到了我们想要的按session ID的排名结果。
^^

参考:

Spark Programming Guide
mmicky Spark大数据快速计算平台

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
What Are the Limitations of Using Views in MySQL?What Are the Limitations of Using Views in MySQL?May 14, 2025 am 12:10 AM

MySQLviewshavelimitations:1)Theydon'tsupportallSQLoperations,restrictingdatamanipulationthroughviewswithjoinsorsubqueries.2)Theycanimpactperformance,especiallywithcomplexqueriesorlargedatasets.3)Viewsdon'tstoredata,potentiallyleadingtooutdatedinforma

Securing Your MySQL Database: Adding Users and Granting PrivilegesSecuring Your MySQL Database: Adding Users and Granting PrivilegesMay 14, 2025 am 12:09 AM

ProperusermanagementinMySQLiscrucialforenhancingsecurityandensuringefficientdatabaseoperation.1)UseCREATEUSERtoaddusers,specifyingconnectionsourcewith@'localhost'or@'%'.2)GrantspecificprivilegeswithGRANT,usingleastprivilegeprincipletominimizerisks.3)

What Factors Influence the Number of Triggers I Can Use in MySQL?What Factors Influence the Number of Triggers I Can Use in MySQL?May 14, 2025 am 12:08 AM

MySQLdoesn'timposeahardlimitontriggers,butpracticalfactorsdeterminetheireffectiveuse:1)Serverconfigurationimpactstriggermanagement;2)Complextriggersincreasesystemload;3)Largertablesslowtriggerperformance;4)Highconcurrencycancausetriggercontention;5)M

MySQL: Is it safe to store BLOB?MySQL: Is it safe to store BLOB?May 14, 2025 am 12:07 AM

Yes,it'ssafetostoreBLOBdatainMySQL,butconsiderthesefactors:1)StorageSpace:BLOBscanconsumesignificantspace,potentiallyincreasingcostsandslowingperformance.2)Performance:LargerrowsizesduetoBLOBsmayslowdownqueries.3)BackupandRecovery:Theseprocessescanbe

MySQL: Adding a user through a PHP web interfaceMySQL: Adding a user through a PHP web interfaceMay 14, 2025 am 12:04 AM

Adding MySQL users through the PHP web interface can use MySQLi extensions. The steps are as follows: 1. Connect to the MySQL database and use the MySQLi extension. 2. Create a user, use the CREATEUSER statement, and use the PASSWORD() function to encrypt the password. 3. Prevent SQL injection and use the mysqli_real_escape_string() function to process user input. 4. Assign permissions to new users and use the GRANT statement.

MySQL: BLOB and other no-sql storage, what are the differences?MySQL: BLOB and other no-sql storage, what are the differences?May 13, 2025 am 12:14 AM

MySQL'sBLOBissuitableforstoringbinarydatawithinarelationaldatabase,whileNoSQLoptionslikeMongoDB,Redis,andCassandraofferflexible,scalablesolutionsforunstructureddata.BLOBissimplerbutcanslowdownperformancewithlargedata;NoSQLprovidesbetterscalabilityand

MySQL Add User: Syntax, Options, and Security Best PracticesMySQL Add User: Syntax, Options, and Security Best PracticesMay 13, 2025 am 12:12 AM

ToaddauserinMySQL,use:CREATEUSER'username'@'host'IDENTIFIEDBY'password';Here'showtodoitsecurely:1)Choosethehostcarefullytocontrolaccess.2)SetresourcelimitswithoptionslikeMAX_QUERIES_PER_HOUR.3)Usestrong,uniquepasswords.4)EnforceSSL/TLSconnectionswith

MySQL: How to avoid String Data Types common mistakes?MySQL: How to avoid String Data Types common mistakes?May 13, 2025 am 12:09 AM

ToavoidcommonmistakeswithstringdatatypesinMySQL,understandstringtypenuances,choosetherighttype,andmanageencodingandcollationsettingseffectively.1)UseCHARforfixed-lengthstrings,VARCHARforvariable-length,andTEXT/BLOBforlargerdata.2)Setcorrectcharacters

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.