IDEA 全称IntelliJ IDEA,是java语言开发的集成环境,IntelliJ在业界被公认为最好的java开发工具之一,尤其在智能代码助手、代码自动提示、重构、J2EE支持、Ant、JUnit、CVS整合、代码审查、 创新的GUI设计等方面的功能都非常棒,而且IDEA是目前Scala支持最
IDEA 全称IntelliJ IDEA,是java语言开发的集成环境,IntelliJ在业界被公认为最好的java开发工具之一,尤其在智能代码助手、代码自动提示、重构、J2EE支持、Ant、JUnit、CVS整合、代码审查、 创新的GUI设计等方面的功能都非常棒,而且IDEA是目前Scala支持最好的IDE。IDEA分ultimate和free edition版,ultimate提供了J2EE等很多非常强力的功能,free edition我觉得已经对于我这样的初学者已经够用了。前面写过一篇配置IntelliJ IDEA 13的SBT和Scala开发环境,本文在这个基础上使用IDEA进行Spark应用的配置和开发。
1. IDEA环境配置
(1). 首先在IntellJ/bin/idea64.exe.vmoptions(对应64位大内存系统),加大IDEA的启动内存:
-Xms512m -Xmx1024m -XX:MaxPermSize=512m
(2). 在IDEA中,Project相当于eclipse中的workspace,同一IDEA窗口只能打开一个workspace。而IDEA中的module等同于eclipse中的project,所以通过File – New Module来为当前Project创建一个module。
(3). IDEA会生成大量的缓存文件,来于保存配置信息、插件和项目索引文件等。,一般都会有代码的十倍大小左右大小。在Windows下目录为C:\Users\THINKP\.IntelliJIdea13,使用File – Invalidate Caches可以校验索引的有效性并在需要的时候重建。IDEA会经常读写这些缓存文件,所以使用SSD来存储缓存文件会提高不少性能。下面是修改缓存文件路径的方法:
a). 关闭IDEA
b). 将cache目录复制到对应的目录下面。
c). 打开IntelliJ IDEA 13.1.3\bin\idea.properties文件,例如将IDEA转移到目录D:\Program Files\.IntelliJIdea13中,只需要修改
idea.config.path=D:/Program Files/.IntelliJIdea13/config
idea.system.path=D:/Program Files/.IntelliJIdea13/system
(4). 主题和颜色
Settings – IDE Settings – Appearance – Theme:Darcula
然后把下面override font选项勾上,选择Yahei 14号字体。
然后重启IDEA,界面变成了灰黑色风格,瞬间顺眼了很多!
编辑器可以设置单独的主题,当前面设置了全局主题时,编辑器的主题也会被修改。接下来,编辑器界面字体有点小,可以在Editor – Colors&Fonts – Fonts另存为一个新的主题,并在这个新主题中修改配置。我的屏幕分辨率有点大,所以设置了15号字体。
光标所在行背景颜色
Editor – Colors&Fonts – General – Caret row,选择了蓝色背景,这样就有了较大的色差。
(5). 常用快捷键
界面中的Alt+1
project窗口
Alt+7
代码结构图
Alt+2
Favorite
F11
打书签,再按一次取消。此时Favorite - Bookmark
里就有这一项。
TODO list Alt+6
注释中以TODO开头时,该TODO项就可以在TODO标签页中找到。这样在有一些思路但是来不及做时,可以以TODO的形式写注释
同步项目(Detect all externally changed files and reload them from disk)Ctrl+Y
保存(Save all) Ctrl+S
undo Ctrl+Z
redo Ctrl+Shift+Y
剪切 Ctrl+X
复制 Ctrl+C
粘贴 Ctrl+V
查找 Ctrl+F
替换 Ctrl+R
光标的上一个位置(undo navigation) Ctrl+Alt+<br>
光标的下一个位置(redo navigation) <code>Ctrl+Alt+->
make Ctrl+F9
(6). 项目文件设定
行分割模式: File - Separators 选择Windows风格(/r/n), UNIX的风格(/n)或者mac风格(/r)等等。
将文件锁定编辑 - File - Make file read only
文件编码设置 Project Settings - File Encodings
推荐YouMeek IDEA教程,我认为是目前详细的IDEA教程之一。
http://www.youmeek.com/category/software-system/my-intellij-idea/
2. 使用IDEA开发Spark程序并运行
首先编辑build.sbt文件,每个配置项都要有一个空格来分割。
build.sbt name := "sbtTest" version := "1.0" scalaVersion := "2.10.4" libraryDependencies += "org.apache.spark" %% "spark-core_2.10" % "1.0.2" libraryDependencies += "org.apache.spark" % "spark-bagel_2.10" % "1.0.2" libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.2" libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.0.2" libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.0.2"
打开SBT,可以观察到SBT正在downloading dependencies。
... [info] downloading http://repo1.maven.org/maven2/org/apache/spark/spark-bagel_2.10/1.0.2/spark-bagel_2.10-1.0.2.jar ... [info] [SUCCESSFUL ] org.apache.spark#spark-bagel_2.10;1.0.2!spark-bagel_2.10.jar (5672ms) [info] downloading http://repo1.maven.org/maven2/org/apache/spark/spark-mllib_2.10/1.0.2/spark-mllib_2.10-1.0.2.jar ... [info] [SUCCESSFUL ] org.apache.spark#spark-mllib_2.10;1.0.2!spark-mllib_2.10.jar (7351ms) [info] downloading http://repo1.maven.org/maven2/org/apache/spark/spark-graphx_2.10/1.0.2/spark-graphx_2.10-1.0.2.jar ... [info] [SUCCESSFUL ] org.apache.spark#spark-graphx_2.10;1.0.2!spark-graphx_2.10.jar (6349ms) ... ...
编写代码,这段代码用于处理web前端日志,其中第二列是session的ID,输出Session访问次数的排名。
/** * Created by Debugo on 2014/8/25. */ import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.SparkContext._ object LogAnalyzer { def main(args:Array[String]): Unit ={ if(args.length!=2) { System.err.println("Usage: LogAnalyzer ") System.exit(1) } val conf = new SparkConf().setAppName("LogAnalyzer") val sc = new SparkContext(conf) // args(0)=file:///root/access_log/access_log.20080601.decode.filter // args(1)=file:///root/access_log/result sc.textFile(args(0)).map(_.split("\t| ")).filter(_.length==6). map(x=>(x(1),1)).reduceByKey(_+_).map(x=>(x._2,x._1)). sortByKey(false).map(x=>(x._2,x._1)).saveAsTextFile(args(1)) sc.stop() } }
在sbt命令行中中compile&package
> compile [info] Compiling 1 Scala source to C:\Users\Administrator\IdeaProjects\Spark0\target\scala-2.10\classes... [success] Total time: 5 s, completed 2014-8-25 16:05:20 > package [info] Packaging C:\Users\Administrator\IdeaProjects\Spark0\target\scala-2.10\spark0_2.10-1.0.jar ... [info] Done packaging. [success] Total time: 0 s, completed 2014-8-25 16:17:12
将jar上传到配置spark运行库的节点,提交job,spark会创建结果输出的result目录。最终RDD被分割成了5个分区。
spark-submit --master spark://debugo:7077 --class LogAnalyzer --executor-memory=10g /root/spark0_2.10-1.0.jar file:///root/access_log/access_log.20080601.decode.filter file:///root/access_log/result ... $ ll /root/access_log/result total 10840 -rw-r--r-- 1 root root 2708325 Aug 25 15:58 part-00000 -rw-r--r-- 1 root root 1114214 Aug 25 15:58 part-00001 -rw-r--r-- 1 root root 2239113 Aug 25 15:58 part-00002 -rw-r--r-- 1 root root 0 Aug 25 15:58 part-00003 -rw-r--r-- 1 root root 5028580 Aug 25 15:58 part-00004 -rw-r--r-- 1 root root 0 Aug 25 15:58 _SUCCESS $ more part-00000 (11579135515147154,431) (6383499980790535,385) (7822241147182134,370) (900755558064074,335) (12385969593715146,226) ...
得到了我们想要的按session ID的排名结果。
^^
参考:
Spark Programming Guide
mmicky Spark大数据快速计算平台
原文地址:使用IDEA开发Spark应用, 感谢原作者分享。

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

Atom editor mac version download
The most popular open source editor

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software