Home ZMarkupParser HTML String to NSAttributedString Tool
Post
Cancel

ZMarkupParser HTML String to NSAttributedString Tool

ZMarkupParser HTML String to NSAttributedString Tool

Convert HTML String to NSAttributedString with corresponding Key style settings

ZhgChgLi / ZMarkupParser

[ZhgChgLi](https://github.com/ZhgChgLi){:target="_blank"} [ZMarkupParser](https://github.com/ZhgChgLi/ZMarkupParser){:target="_blank"}

ZhgChgLi / ZMarkupParser

Features

  • Developed purely in Swift, parses HTML Tags using Regex and Tokenization, corrects tag errors (fixes unclosed tags & misaligned tags), converts to an abstract syntax tree, and finally uses the Visitor Pattern to map HTML Tags to abstract styles, resulting in the final NSAttributedString output; does not rely on any Parser Lib.
  • Supports HTML Render (to NSAttributedString) / Stripper (removes HTML Tags) / Selector functionality
  • Automatically corrects tag errors (fixes unclosed tags & misaligned tags) <br> -> <br/> <b>Bold<i>Bold+Italic</b>Italic</i> -> <b>Bold<i>Bold+Italic</i></b><i>Italic</i> <Congratulation!> -> <Congratulation!> (treat as String)
  • Supports custom style specifications e.g. <b></b> -> weight: .semibold & underline: 1
  • Supports custom HTML Tag parsing e.g. parse <zhgchgli></zhgchgli> into desired styles
  • Includes architecture design for easy HTML Tag extension Currently supports basic styles, as well as ul/ol/li lists and hr separators. Future support for other HTML Tags can be quickly added.
  • Supports style parsing from style HTML Attribute HTML can specify text styles from the style attribute, and this tool also supports style specifications from style e.g. <b style=”font-size: 20px”></b> -> bold + font size 20 px
  • Supports iOS/macOS
  • Supports HTML Color Name to UIColor/NSColor
  • Test Coverage: 80%+
  • Supports parsing of <img> images, <ul> lists, <table> tables, etc.
  • Higher performance than NSAttributedString.DocumentType.html

Performance Benchmark

[Performance Benchmark](https://quickchart.io/chart-maker/view/zm-73887470-e667-4ca3-8df0-fe3563832b0b){:target="_blank"}

Performance Benchmark

  • Test Environment: 2022/M2/24GB Memory/macOS 13.2/XCode 14.1
  • X-axis: Number of HTML characters
  • Y-axis: Time taken to render (seconds)

*Additionally, NSAttributedString.DocumentType.html crashes with strings longer than 54,600+ characters (EXC_BAD_ACCESS).

Demo

You can directly download the project, open ZMarkupParser.xcworkspace, select the ZMarkupParser-Demo target, and Build & Run to test the effects.

Installation

Supports SPM/Cocoapods, please refer to the Readme.

Usage

Style Declaration

MarkupStyle/MarkupStyleColor/MarkupStyleParagraphStyle, corresponding to the encapsulation of NSAttributedString.Key.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
var font: MarkupStyleFont
var paragraphStyle: MarkupStyleParagraphStyle
var foregroundColor: MarkupStyleColor? = nil
var backgroundColor: MarkupStyleColor? = nil
var ligature: NSNumber? = nil
var kern: NSNumber? = nil
var tracking: NSNumber? = nil
var strikethroughStyle: NSUnderlineStyle? = nil
var underlineStyle: NSUnderlineStyle? = nil
var strokeColor: MarkupStyleColor? = nil
var strokeWidth: NSNumber? = nil
var shadow: NSShadow? = nil
var textEffect: String? = nil
var attachment: NSTextAttachment? = nil
var link: URL? = nil
var baselineOffset: NSNumber? = nil
var underlineColor: MarkupStyleColor? = nil
var strikethroughColor: MarkupStyleColor? = nil
var obliqueness: NSNumber? = nil
var expansion: NSNumber? = nil
var writingDirection: NSNumber? = nil
var verticalGlyphForm: NSNumber? = nil
...

You can declare the styles you want to apply to the corresponding HTML tags:

1
let myStyle = MarkupStyle(font: MarkupStyleFont(size: 13), backgroundColor: MarkupStyleColor(name: .aquamarine))

HTML Tag

Declare the HTML tags to be rendered and the corresponding Markup Style. The currently predefined HTML tag names are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
A_HTMLTagName(), // <a></a>
B_HTMLTagName(), // <b></b>
BR_HTMLTagName(), // <br></br>
DIV_HTMLTagName(), // <div></div>
HR_HTMLTagName(), // <hr></hr>
I_HTMLTagName(), // <i></i>
LI_HTMLTagName(), // <li></li>
OL_HTMLTagName(), // <ol></ol>
P_HTMLTagName(), // <p></p>
SPAN_HTMLTagName(), // <span></span>
STRONG_HTMLTagName(), // <strong></strong>
U_HTMLTagName(), // <u></u>
UL_HTMLTagName(), // <ul></ul>
DEL_HTMLTagName(), // <del></del>
IMG_HTMLTagName(handler: ZNSTextAttachmentHandler), // <img> and image downloader
TR_HTMLTagName(), // <tr>
TD_HTMLTagName(), // <td>
TH_HTMLTagName(), // <th>
...and more
...

This way, when parsing the <a> Tag, it will apply the specified MarkupStyle.

Extend HTMLTagName:

1
let zhgchgli = ExtendTagName("zhgchgli")

HTML Style Attribute

As mentioned earlier, HTML supports specifying styles from the Style Attribute. Here, it is abstracted to specify supported styles and extensions. The currently predefined HTML Style Attributes are as follows:

1
2
3
4
5
6
7
ColorHTMLTagStyleAttribute(), // color
BackgroundColorHTMLTagStyleAttribute(), // background-color
FontSizeHTMLTagStyleAttribute(), // font-size
FontWeightHTMLTagStyleAttribute(), // font-weight
LineHeightHTMLTagStyleAttribute(), // line-height
WordSpacingHTMLTagStyleAttribute(), // word-spacing
...

Extend Style Attribute:

1
2
3
4
5
6
7
8
9
ExtendHTMLTagStyleAttribute(styleName: "text-decoration", render: { value in
  var newStyle = MarkupStyle()
  if value == "underline" {
    newStyle.underline = NSUnderlineStyle.single
  } else {
    // ...  
  }
  return newStyle
})

Usage

1
2
3
import ZMarkupParser

let parser = ZHTMLParserBuilder.initWithDefault().set(rootStyle: MarkupStyle(font: MarkupStyleFont(size: 13)).build()

initWithDefault will automatically add predefined HTML Tag Names & default corresponding MarkupStyles as well as predefined Style Attributes.

set(rootStyle:) can specify the default style for the entire string, or it can be left unspecified.

Customization

1
2
let parser = ZHTMLParserBuilder.initWithDefault().add(ExtendTagName("zhgchgli"), withCustomStyle: MarkupStyle(backgroundColor: MarkupStyleColor(name: .aquamarine))).build() // will use markupstyle you specify to render extend html tag <zhgchgli></zhgchgli>
let parser = ZHTMLParserBuilder.initWithDefault().add(B_HTMLTagName(), withCustomStyle: MarkupStyle(font: MarkupStyleFont(size: 18, weight: .style(.semibold)))).build() // will use markupstyle you specify to render <b></b> instead of default bold markup style

HTML Render

1
2
3
4
5
6
let attributedString = parser.render(htmlString) // NSAttributedString

// work with UITextView
textView.setHtmlString(htmlString)
// work with UILabel
label.setHtmlString(htmlString)

HTML Stripper

1
parser.stripper(htmlString)

Selector HTML String

1
2
3
4
5
6
7
let selector = parser.selector(htmlString) // HTMLSelector e.g. input: <a><b>Test</b>Link</a>
selector.first("a")?.first("b").attributedString // will return Test
selector.filter("a").attributedString // will return Test Link

// render from selector result
let selector = parser.selector(htmlString) // HTMLSelector e.g. input: <a><b>Test</b>Link</a>
parser.render(selector.first("a")?.first("b"))

Async

Additionally, if you need to render long strings, you can use the async method to prevent UI blocking.

1
2
3
parser.render(String) { _ in }...
parser.stripper(String) { _ in }...
parser.selector(String) { _ in }...

Know-how

  • The hyperlink style in UITextView depends on linkTextAttributes, so there might be cases where NSAttributedString.key is set but has no effect.
  • UILabel does not support specifying URL styles, so there might be cases where NSAttributedString.key is set but has no effect.
  • If you need to render complex HTML, you still need to use WKWebView (including JS/tables rendering).

Technical principles and development story: “The Story of Handcrafting an HTML Parser

Contributions and Issues are welcome and will be promptly addressed

For any questions or feedback, feel free to contact me.

===

本文中文版本

===

This article was first published in Traditional Chinese on Medium ➡️ View Here


This post is licensed under CC BY 4.0 by the author.

Pinkoi 2022 Open House for GenZ — 15 Mins Career Talk

The Craft of Building a Handmade HTML Parser