Ruby XML, XSLT 및 XPath 튜토리얼

XML이란 무엇인가요?

XML은 eXtensible Markup Language를 의미합니다.

Extensible Markup Language는 전자 문서를 마크업하여 구조화하는 데 사용되는 마크업 언어인 Standard Universal Markup Language의 하위 집합입니다.

데이터를 표시하고 데이터 유형을 정의하는 데 사용할 수 있습니다. 사용자가 자신의 마크업 언어를 정의할 수 있는 소스 언어입니다. 이는 World Wide Web 전송에 이상적으로 적합하며 응용 프로그램이나 공급업체와 관계없이 구조화된 데이터를 설명하고 교환하는 통합된 접근 방식을 제공합니다.

자세한 내용은 XML 튜토리얼을 참조하세요.

XML 파서 구조 및 API

XML 파서에는 DOM과 SAX라는 두 가지 주요 유형이 있습니다.

SAX 파서는 이벤트 처리를 기반으로 하며 스캔 프로세스 중에 문법 구조가 발견될 때마다 이 특정 문법 구조의 이벤트 핸들러가 호출됩니다. 애플리케이션에 대한 이벤트입니다.
DOM은 문서의 계층적 구문 구조를 구축하고 메모리에 DOM 트리를 생성하는 문서 객체 모델 구문 분석입니다. 문서 구문 분석이 완료된 후 DOM 트리의 노드가 식별됩니다. 문서의 전체 DOM 트리가 메모리에 저장됩니다.

Ruby에서 XML 구문 분석 및 생성

RUBY는 이 라이브러리 REXML 라이브러리를 사용하여 XML 문서를 구문 분석할 수 있습니다.

REXML 라이브러리는 Ruby용 XML 툴킷으로 순수 Ruby 언어로 작성되었으며 XML1.0 사양을 준수합니다.

Ruby 버전 1.8 이상에서는 REXML이 RUBY 표준 라이브러리에 포함됩니다.

REXML 라이브러리의 경로는 rexml/document

모든 메소드와 클래스가 REXML 모듈에 캡슐화되어 있습니다.

REXML 파서는 다른 파서에 비해 다음과 같은 장점이 있습니다.

100% Ruby로 작성되었습니다.
SAX 및 DOM 파서와 함께 작동합니다.
2000줄 미만의 코드로 가볍습니다.
메소드와 클래스를 매우 이해하기 쉽습니다.
SAX2 API 및 전체 XPath 지원을 기반으로 합니다.
따로 설치할 필요 없이 Ruby를 이용해 설치해보세요.

다음은 movie.xml로 저장된 예제의 XML 코드입니다.

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
   <movie title="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

DOM 파서

먼저 XML 데이터를 구문 분석하고, 먼저 rexml/document 라이브러리를 소개합니다. 일반적으로 REXML을 넣을 수 있습니다. 최상위 레벨 네임스페이스에 도입됨:

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# 获取 root 元素
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]

# 以下将输出电影标题
xmldoc.elements.each("collection/movie"){ 
   |e| puts "Movie Title : " + e.attributes["title"] 
}

# 以下将输出所有电影类型
xmldoc.elements.each("collection/movie/type") {
   |e| puts "Movie Type : " + e.text 
}

# 以下将输出所有电影描述
xmldoc.elements.each("collection/movie/description") {
   |e| puts "Movie Description : " + e.text 
}

위 예제의 출력 결과는 다음과 같습니다.

Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War, Thriller
Movie Type : Anime, Science Fiction
Movie Type : Anime, Action
Movie Type : Comedy
Movie Description : Talk about a US-Japan war
Movie Description : A schientific fiction
Movie Description : Vash the Stampede!
Movie Description : Viewable boredom
SAX-like Parsing:

SAX 파서

는 동일한 데이터 파일인 Movies.xml을 처리합니다. SAX를 작은 파일로 구문 분석하는 것은 권장되지 않습니다. 다음은 간단한 예입니다.

#!/usr/bin/ruby -w

require 'rexml/document'
require 'rexml/streamlistener'
include REXML


class MyListener
  include REXML::StreamListener
  def tag_start(*args)
    puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
  end

  def text(data)
    return if data =~ /^\w*$/     # whitespace only
    abbrev = data[0..40] + (data.length > 40 ? "..." : "")
    puts "  text   :   #{abbrev.inspect}"
  end
end

list = MyListener.new
xmlfile = File.new("movies.xml")
Document.parse_stream(xmlfile, list)

위 출력은 다음과 같습니다.

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
  text   :   "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
  text   :   "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
  text   :   "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
  text   :   "Viewable boredom"

XPath 및 Ruby

XPath를 사용하여 XML 문서에서 정보를 찾을 수 있습니다(XPath 튜토리얼 참조).

XPath는 XML 경로 언어로, XML(Standard Universal Markup Language의 하위 집합) 문서에서 특정 부분의 위치를 결정하는 데 사용되는 언어입니다. XPath는 XML의 트리 구조를 기반으로 하며 데이터 구조 트리에서 노드를 찾는 기능을 제공합니다.

Ruby는 트리 기반 구문 분석(Document Object Model)인 REXML의 XPath 클래스를 통해 XPath를 지원합니다.

#!/usr/bin/ruby -w

require 'rexml/document'
include REXML

xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)

# 第一个电影的信息
movie = XPath.first(xmldoc, "//movie")
p movie

# 打印所有电影类型
XPath.each(xmldoc, "//type") { |e| puts e.text }

# 获取所有电影格式的类型，返回数组
names = XPath.match(xmldoc, "//format").map {|x| x.text }
p names

위 예의 출력 결과는 다음과 같습니다.

<movie title='Enemy Behind'> ... </>
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT 및 Ruby

Ruby에는 두 개의 XSLT 파서가 있으며, 간단한 설명은 다음과 같습니다.

Ruby-Sablotron

이 파서는 Masayoshi Takahash 판사가 작성하고 관리합니다. 이는 주로 Linux 운영 체제용으로 작성되었으며 다음 라이브러리가 필요합니다.

Sablot
Iconv
Expat

이러한 라이브러리는 Ruby-Sablotron에서 찾을 수 있습니다.

XSLT4R

XSLT4R은 Michael Neumann이 작성했습니다. XSLT4R은 간단한 명령줄 상호 작용에 사용되며 타사 응용 프로그램에서 XML 문서를 변환하는 데 사용할 수 있습니다.

XSLT4R에는 XMLScan 작업이 필요하며 100% Ruby 모듈인 XSLT4R 아카이브가 포함되어 있습니다. 이러한 모듈은 표준 Ruby 설치 방법(예: Ruby install.rb)을 사용하여 설치할 수 있습니다.

XSLT4R의 구문 형식은 다음과 같습니다.

ruby xslt.rb stylesheet.xsl document.xml [arguments]

응용 프로그램에서 XSLT4R을 사용하려면 XSLT를 도입하고 필요한 매개변수를 입력할 수 있습니다. 예는 다음과 같습니다.

require "xslt"

stylesheet = File.readlines("stylesheet.xsl").to_s
xml_doc = File.readlines("document.xml").to_s
arguments = { 'image_dir' => '/....' }

sheet = XSLT::Stylesheet.new( stylesheet, arguments )

# output to StdOut
sheet.apply( xml_doc )

# output to 'str'
str = ""
sheet.output = [ str ]
sheet.apply( xml_doc )

자세한 내용은

전체 REXML 파서에 대해서는 REXML 파서 문서를 참조하세요.
RAA 기술 자료에서 XSLT4R을 다운로드할 수 있습니다.