Home Implementing iOS NSAttributedString HTML Render Yourself
Post
Cancel

Implementing iOS NSAttributedString HTML Render Yourself

Implementing iOS NSAttributedString HTML Render Yourself

An alternative to iOS NSAttributedString DocumentType.html

Photo by [Florian Olivo](https://unsplash.com/@florianolv?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText){:target="_blank"}

Photo by Florian Olivo

[TL;DR] 2023/03/12

Re-developed using another method ZMarkupParser HTML String to NSAttributedString Tool , for technical details and development stories, please visit 「 The Story of Handcrafting an HTML Parser

Origin

Since the release of iOS 15 last year, the app has been plagued by a crash issue that has topped the charts for a long time. According to the data, in the past 90 days (2022/03/11~2022/06/08), it caused over 2.4K crashes, affecting over 1.4K users.

From the data, it appears that this massive crash issue has been fixed (or the occurrence rate has been reduced) in subsequent versions of iOS ≥ 15.2, as the trend is showing a decline.

Most affected versions: iOS 15.0.X ~ iOS 15.X.X

Additionally, there were sporadic crashes found in iOS 12 and iOS 13, indicating that this issue has existed for a long time, but the occurrence rate in the early versions of iOS 15 was almost 100%.

Crash Cause:

1
<compiler-generated> line 2147483647 specialized @nonobjc NSAttributedString.init(data:options:documentAttributes:)

NSAttributedString crashes during init with Crashed: com.apple.main-thread EXC_BREAKPOINT 0x00000001de9d4e44.

It is also possible that the operation was not on the Main Thread.

Reproduction Method:

When this issue first appeared massively, it puzzled the development team; re-testing the crash points in the Crash Log showed no problems, and it was unclear under what circumstances the users encountered the issue. Until one day, by chance, I switched to “Low Power Mode” and triggered the issue! WTF!!!

Solution

After some searching, I found many similar cases online and also found the earliest similar crash issue question on the App Developer Forums, with an official response:

  • This is a known iOS Foundation Bug: It has existed since iOS 12
  • To render complex HTML without rendering constraints: use WKWebView
  • With rendering constraints: you can write your own HTML Parser & Renderer
  • Directly use Markdown as rendering constraints: iOS ≥ 15 NSAttributedString can directly render text using Markdown format

Rendering constraints means limiting the rendering formats that the app can support, such as only supporting bold, italic, hyperlinks.

Supplement. Rendering complex HTML — aiming to create text wrapping effects

You can coordinate with the backend to create an interface:

1
2
3
4
5
6
7
8
9
10
{
  "content":[
    {"type":"text","value":"Paragraph 1 plain text"},
    {"type":"text","value":"Paragraph 2 plain text"},
    {"type":"text","value":"Paragraph 3 plain text"},
    {"type":"text","value":"Paragraph 4 plain text"},
    {"type":"image","src":"https://zhgchg.li/logo.png","title":"ZhgChgLi"},
    {"type":"text","value":"Paragraph 5 plain text"}
  ]
}

You can combine it with Markdown to support text rendering, or refer to Medium’s approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"Paragraph": {
    "text": "code in text, and link in text, and ZhgChgLi, and bold, and I, only i",
    "markups": [
      {
        "type": "CODE",
        "start": 5,
        "end": 7
      },
      {
        "start": 18,
        "end": 22,
        "href": "http://zhgchg.li",
        "type": "LINK"
      },
      {
        "type": "STRONG",
        "start": 50,
        "end": 63
      },
      {
        "type": "EM",
        "start": 55,
        "end": 69
      }
    ]
}

This means that for the text code in text, and link in text, and ZhgChgLi, and bold, and I, only i:

1
2
3
4
- Characters 5 to 7 should be marked as code (wrapped in `Text` format)
- Characters 18 to 22 should be marked as a link (wrapped in [Text](URL) format)
- Characters 50 to 63 should be marked as bold (wrapped in *Text* format)
- Characters 55 to 69 should be marked as italic (wrapped in _Text_ format)

With a standardized and describable structure, the app can use native methods to render, achieving optimal performance and user experience.

For the pitfalls of using UITextView for text wrapping, you can refer to my previous article: iOS UITextView Text Wrapping Editor (Swift)

Why?

Before implementing the solution, let’s first explore the problem itself. Personally, I believe the main cause of this issue is not from Apple; the official bug is just the trigger point.

The main problem comes from treating the app as a web renderer. The advantage is that web development is fast, the same API endpoint can provide HTML to all clients without distinction, and it can flexibly render any content. The disadvantage is that HTML is not a common interface for apps, you can’t expect app engineers to understand HTML, performance is extremely poor, it can only run on the main thread, the development stage cannot predict the result, and it is difficult to confirm the supported specifications.

Looking further into the problem, it often stems from unclear original requirements, uncertainty about which specifications the app needs to support, and the pursuit of speed, leading to the direct use of HTML as the interface between the app and the web.

Extremely poor performance

Supplementing the performance part, actual tests show that directly using NSAttributedString DocumentType.html and implementing the rendering method yourself has a speed difference of 5 to 20 times.

Better

Since it is for App use, a better approach should be based on App development methods. For Apps, the cost of adjusting requirements is much higher than for the Web; effective App development should be based on iterative adjustments with specifications. At the moment, we need to confirm the specifications that can be supported. If we need to change them later, we will schedule time to expand the specifications. We cannot quickly change them as we wish, which can reduce communication costs and increase work efficiency.

  • Confirm the scope of requirements
  • Confirm the supported specifications
  • Confirm the interface specifications (Markdown/BBCode/… can continue to use HTML, but it must be constrained, such as only using <b>/<i>/<a>/<u>, and it must be explicitly informed to the developers in the program)
  • Implement the rendering mechanism yourself
  • Maintain and iteratively support the specifications

[2023/02/27 Updated] [TL;DR]:

Updated approach, no longer using XMLParser, due to zero tolerance for errors:

<br> / <Congratulation!> / <b>Bold<i>Bold+Italic</b>Italic</i> The above three possible scenarios will all cause XMLParser to throw an error and display blank. Using XMLParser, the HTML string must fully comply with XML rules, unlike browsers or NSAttributedString.DocumentType.html which can tolerate errors and display normally.

Switch to pure Swift development, parsing HTML tags through Regex and Tokenization, analyzing and correcting tag correctness (correcting tags without end & misplaced tags), then converting them into an abstract syntax tree, and finally using the Visitor Pattern to map HTML tags to abstract styles, obtaining the final NSAttributedString result; without relying on any Parser Lib.

— —

How?

The die is cast, back to the main topic. Currently, we are using HTML to render NSAttributedString, so how do we solve the above crash and performance issues?

Inspired by

Strip HTML

Before talking about HTML Render, let’s talk about Strip HTML again. As mentioned in the Why? section, where the App will get HTML and what kind of HTML it will get should be specified in the specifications; rather than the App “ possibly “ getting HTML and needing to strip it.

As a former supervisor said: Isn’t this too crazy?

Option 1. NSAttributedString

1
2
3
let data = "<div>Text</div>".data(using: .unicode)!
let attributed = try NSAttributedString(data: data, options: [.documentType: NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
let string = attributed.string
  • Use NSAttributedString to render HTML and then extract the string to get a clean String
  • The same issues as in this chapter, iOS 15 is prone to crashes, poor performance, and can only be operated on the Main Thread

Option 2. Regex

1
2
htmlString = "<div>Test</div>"
htmlString.replacingOccurrences(of: "<[^>]+>", with: "", options: .regularExpression, range: nil)
  • The simplest and most effective way
  • Regex cannot guarantee complete correctness e.g. <p foo=">now what?">Paragraph</p> is valid HTML but will be stripped incorrectly

Option 3. XMLParser

Refer to the approach of SwiftRichString, using XMLParser from Foundation to parse HTML as XML and implement HTML Parser & Strip functionality.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import UIKit
// Ref: https://github.com/malcommac/SwiftRichString
final class HTMLStripper: NSObject, XMLParserDelegate {

    private static let topTag = "source"
    private var xmlParser: XMLParser
    
    private(set) var storedString: String
    
    // The XML parser sometimes splits strings, which can break localization-sensitive
    // string transforms. Work around this by using the currentString variable to
    // accumulate partial strings, and then reading them back out as a single string
    // when the current element ends, or when a new one is started.
    private var currentString: String?
    
    // MARK: - Initialization

    init(string: String) throws {
        let xmlString = HTMLStripper.escapeWithUnicodeEntities(string)
        let xml = "<\(HTMLStripper.topTag)>\(xmlString)</\(HTMLStripper.topTag)>"
        guard let data = xml.data(using: String.Encoding.utf8) else {
            throw XMLParserInitError("Unable to convert to UTF8")
        }
        
        self.xmlParser = XMLParser(data: data)
        self.storedString = ""
        
        super.init()
        
        xmlParser.shouldProcessNamespaces = false
        xmlParser.shouldReportNamespacePrefixes = false
        xmlParser.shouldResolveExternalEntities = false
        xmlParser.delegate = self
    }
    
    /// Parse and generate attributed string.
    func parse() throws -> String {
        guard xmlParser.parse() else {
            let line = xmlParser.lineNumber
            let shiftColumn = (line == 1)
            let shiftSize = HTMLStripper.topTag.lengthOfBytes(using: String.Encoding.utf8) + 2
            let column = xmlParser.columnNumber - (shiftColumn ? shiftSize : 0)
            
            throw XMLParserError(parserError: xmlParser.parserError, line: line, column: column)
        }
        
        return storedString
    }
    
    // MARK: XMLParserDelegate
    
    @objc func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String: String]) {
        foundNewString()
    }
    
    @objc func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
        foundNewString()
    }
    
    @objc func parser(_ parser: XMLParser, foundCharacters string: String) {
        currentString = (currentString ?? "").appending(string)
    }
    
    // MARK: Support Private Methods
    
    func foundNewString() {
        if let currentString = currentString {
            storedString.append(currentString)
            self.currentString = nil
        }
    }
    
    // handle html entity / html hex
    // Perform string escaping to replace all characters which is not supported by NSXMLParser
    // into the specified encoding with decimal entity.
    // For example if your string contains '&' character parser will break the style.
    // This option is active by default.
    // ref: https://github.com/malcommac/SwiftRichString/blob/e0b72d5c96968d7802856d2be096202c9798e8d1/Sources/SwiftRichString/Support/XMLStringBuilder.swift
    static func escapeWithUnicodeEntities(_ string: String) -> String {
        guard let escapeAmpRegExp = try? NSRegularExpression(pattern: "&(?!(#[0-9]{2,4}|[A-z]{2,6});)", options: NSRegularExpression.Options(rawValue: 0)) else {
            return string
        }
        
        let range = NSRange(location: 0, length: string.count)
        return escapeAmpRegExp.stringByReplacingMatches(in: string,
                                                        options: NSRegularExpression.MatchingOptions(rawValue: 0),
                                                        range: range,
                                                        withTemplate: "&amp;")
    }
}


let test = "我<br/><a href=\"http://google.com\">同意</a>提供<b><i>個</i>人</b>身分證字號/護照/居留<span style=\"color:#FF0000;font-size:20px;word-spacing:10px;line-height:10px\">證號碼</span>,以供<i>跨境物流</i>方通關<span style=\"background-color:#00FF00;\">使用</span>,並已<img src=\"g.png\"/>了解跨境<br/>商品之物<p>流需</p>求"

let stripper = try HTMLStripper(string: test)
print(try! stripper.parse())

// I agree to provide personal ID number/passport/residence permit number for cross-border logistics customs clearance, and have understood the logistics requirements of cross-border goods.

Using Foundation XML Parser to handle String, implement XMLParserDelegate using currentString to store String, since String may sometimes be split into multiple Strings, foundCharacters might be called repeatedly. didStartElement and didEndElement are used to find the start and end of the string, storing the current result and clearing currentString.

  • The advantage is that it also converts HTML Entity to actual characters e.g. &#103; -> g
  • The disadvantage is that it is complex to implement and will fail with XMLParser when encountering non-compliant HTML e.g. <br> should be written as <br/>

Personally, I think Option 2 is a better method for simply stripping HTML. This method is introduced because rendering HTML also uses the same principle. Let’s use this as a simple example :)

HTML Render w/XMLParser

Using XMLParser to implement it yourself, following the same principle as stripping, we can add corresponding rendering methods when parsing certain tags.

Requirements:

  • Support for extending the tags to be parsed
  • Support for setting Tag Default Style e.g. applying link style to <a> Tag
  • Support for parsing style attributes, as HTML will explicitly indicate the style to be displayed in style="color:red"
  • Style support for changing text weight, size, underline, line spacing, letter spacing, background color, text color
  • Does not support Image Tag, Table Tag, etc., more complex TAGs

You can reduce functionality according to your own requirements, for example, if you don’t need to support background color adjustment, you don’t need to open the setting for background color.

This article is just a conceptual implementation, not the best practice in architecture; if you have clear specifications and usage, you can consider applying some Design Patterns to achieve good maintainability and extensibility.

⚠️⚠️⚠️ Attention ⚠️⚠️⚠️

Again, if your App is new or has the opportunity to switch entirely to Markdown format, it is recommended to adopt the above method. Writing your own renderer is too complex and will not perform better than Markdown.

Even if you are on iOS < 15 and do not support native Markdown, you can still find a great Markdown Parser solution on Github.

HTMLTagParser

1
2
3
4
5
6
7
protocol HTMLTagParser {
    static var tag: String { get } // Declare the Tag Name to be parsed, e.g. a
    var storedHTMLAttributes: [String: String]? { get set } // The parsed attributes will be stored here, e.g. href, style
    var style: AttributedStringStyle? { get } // The style to be applied to this Tag
    
    func render(attributedString: inout NSMutableAttributedString) // Implement the logic to render HTML to attributedString
}

Declare the analyzable HTML Tag entity for easy extension and management.

AttributedStringStyle

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
protocol AttributedStringStyle {
    var font: UIFont? { get set }
    var color: UIColor? { get set }
    var backgroundColor: UIColor? { get set }
    var wordSpacing: CGFloat? { get set }
    var paragraphStyle: NSParagraphStyle? { get set }
    var customs: [NSAttributedString.Key: Any]? { get set } // Universal setting, it is recommended to abstract it out after confirming the supported specifications and close this opening
    func render(attributedString: inout NSMutableAttributedString)
}

// abstract implement
extension AttributedStringStyle {
    func render(attributedString: inout NSMutableAttributedString) {
        let range = NSMakeRange(0, attributedString.length)
        if let font = font {
            attributedString.addAttribute(NSAttributedString.Key.font, value: font, range: range)
        }
        if let color = color {
            attributedString.addAttribute(NSAttributedString.Key.foregroundColor, value: color, range: range)
        }
        if let backgroundColor = backgroundColor {
            attributedString.addAttribute(NSAttributedString.Key.backgroundColor, value: backgroundColor, range: range)
        }
        if let wordSpacing = wordSpacing {
            attributedString.addAttribute(NSAttributedString.Key.kern, value: wordSpacing as Any, range: range)
        }
        if let paragraphStyle = paragraphStyle {
            attributedString.addAttribute(NSAttributedString.Key.paragraphStyle, value: paragraphStyle, range: range)
        }
        if let customAttributes = customs {
            attributedString.addAttributes(customAttributes, range: range)
        }
    }
}

Declare the styles that can be set for the Tag.

HTMLStyleAttributedParser

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// only support tag attributed down below
// can set color,font size,line height,word spacing,background color

enum HTMLStyleAttributedParser: String {
    case color = "color"
    case fontSize = "font-size"
    case lineHeight = "line-height"
    case wordSpacing = "word-spacing"
    case backgroundColor = "background-color"
    
    func render(attributedString: inout NSMutableAttributedString, value: String) -> Bool {
        let range = NSMakeRange(0, attributedString.length)
        switch self {
        case .color:
            if let color = convertToiOSColor(value) {
                attributedString.addAttribute(NSAttributedString.Key.foregroundColor, value: color, range: range)
                return true
            }
        case .backgroundColor:
            if let color = convertToiOSColor(value) {
                attributedString.addAttribute(NSAttributedString.Key.backgroundColor, value: color, range: range)
                return true
            }
        case .fontSize:
            if let size = convertToiOSSize(value) {
                attributedString.addAttribute(NSAttributedString.Key.font, value: UIFont.systemFont(ofSize: CGFloat(size)), range: range)
                return true
            }
        case .lineHeight:
            if let size = convertToiOSSize(value) {
                let paragraphStyle = NSMutableParagraphStyle()
                paragraphStyle.lineSpacing = size
                attributedString.addAttribute(NSAttributedString.Key.paragraphStyle, value: paragraphStyle, range: range)
                return true
            }
        case .wordSpacing:
            if let size = convertToiOSSize(value) {
                attributedString.addAttribute(NSAttributedString.Key.kern, value: size, range: range)
                return true
            }
        }
        
        return false
    }
    
    // convert 36px -> 36
    private func convertToiOSSize(_ string: String) -> CGFloat? {
        guard let regex = try? NSRegularExpression(pattern: "^([0-9]+)"),
              let firstMatch = regex.firstMatch(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count)),
              let range = Range(firstMatch.range, in: string),
              let size = Float(String(string[range])) else {
            return nil
        }
        return CGFloat(size)
    }
    
    // convert html hex color #ffffff to UIKit Color
    private func convertToiOSColor(_ hexString: String) -> UIColor? {
        var cString: String = hexString.trimmingCharacters(in: .whitespacesAndNewlines).uppercased()

        if cString.hasPrefix("#") {
            cString.remove(at: cString.startIndex)
        }

        if (cString.count) != 6 {
            return nil
        }

        var rgbValue: UInt64 = 0
        Scanner(string: cString).scanHexInt64(&rgbValue)

        return UIColor(
            red: CGFloat((rgbValue & 0xFF0000) >> 16) / 255.0,
            green: CGFloat((rgbValue & 0x00FF00) >> 8) / 255.0,
            blue: CGFloat(rgbValue & 0x0000FF) / 255.0,
            alpha: CGFloat(1.0)
        )
    }
}

Implement Style Attributed Parser to parse style="color:red;font-size:16px" but CSS Style has many configurable styles, so it is necessary to enumerate the supported range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
extension HTMLTagParser {

    func render(attributedString: inout NSMutableAttributedString) {
        defaultStyleRender(attributedString: &attributedString)
    }
    
    func defaultStyleRender(attributedString: inout NSMutableAttributedString) {
        // setup default style to NSMutableAttributedString
        style?.render(attributedString: &attributedString)
        
        // setup & override HTML style (style="color:red;background-color:black") to NSMutableAttributedString if is exists
        // any html tag can have style attribute
        if let style = storedHTMLAttributes?["style"] {
            let styles = style.split(separator: ";").map { $0.split(separator: ":") }.filter { $0.count == 2 }
            for style in styles {
                let key = String(style[0])
                let value = String(style[1])
                
                if let styleAttributed = HTMLStyleAttributedParser(rawValue: key), styleAttributed.render(attributedString: &attributedString, value: value) {
                    print("Unsupported style attributed or value[\(key):\(value)]")
                }
            }
        }
    }
}

Apply HTMLStyleAttributedParser & HTMLStyleAttributedParser abstract implementation.

Some examples of Tag Parser & AttributedStringStyle implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
struct LinkStyle: AttributedStringStyle {
   var font: UIFont? = UIFont.systemFont(ofSize: 14)
   var color: UIColor? = UIColor.blue
   var backgroundColor: UIColor? = nil
   var wordSpacing: CGFloat? = nil
   var paragraphStyle: NSParagraphStyle?
   var customs: [NSAttributedString.Key: Any]? = [.underlineStyle: NSUnderlineStyle.single.rawValue]
}

struct ATagParser: HTMLTagParser {
    // <a></a>
    static let tag: String = "a"
    var storedHTMLAttributes: [String: String]? = nil
    let style: AttributedStringStyle? = LinkStyle()
    
    func render(attributedString: inout NSMutableAttributedString) {
        defaultStyleRender(attributedString: &attributedString)
        if let href = storedHTMLAttributes?["href"], let url = URL(string: href) {
            let range = NSMakeRange(0, attributedString.length)
            attributedString.addAttribute(NSAttributedString.Key.link, value: url, range: range)
        }
    }
}
struct BoldStyle: AttributedStringStyle {
   var font: UIFont? = UIFont.systemFont(ofSize: 14, weight: .bold)
   var color: UIColor? = UIColor.black
   var backgroundColor: UIColor? = nil
   var wordSpacing: CGFloat? = nil
   var paragraphStyle: NSParagraphStyle?
   var customs: [NSAttributedString.Key: Any]? = [.underlineStyle: NSUnderlineStyle.single.rawValue]
}

struct BoldTagParser: HTMLTagParser {
    // <b></b>
    static let tag: String = "b"
    var storedHTMLAttributes: [String: String]? = nil
    let style: AttributedStringStyle? = BoldStyle()
}

HTMLToAttributedStringParser: XMLParserDelegate core implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
// Ref: https://github.com/malcommac/SwiftRichString
final class HTMLToAttributedStringParser: NSObject {
    
    private static let topTag = "source"
    private var xmlParser: XMLParser?
    
    private(set) var attributedString: NSMutableAttributedString = NSMutableAttributedString()
    private(set) var supportedTagRenders: [HTMLTagParser] = []
    private let defaultStyle: AttributedStringStyle
    
    /// Styles applied at each fragment.
    private var renderingTagRenders: [HTMLTagParser] = []

    // The XML parser sometimes splits strings, which can break localization-sensitive
    // string transforms. Work around this by using the currentString variable to
    // accumulate partial strings, and then reading them back out as a single string
    // when the current element ends, or when a new one is started.
    private var currentString: String?
    
    // MARK: - Initialization

    init(defaultStyle: AttributedStringStyle) {
        self.defaultStyle = defaultStyle
        super.init()
    }
    
    func register(_ tagRender: HTMLTagParser) {
        if let index = supportedTagRenders.firstIndex(where: { type(of: $0).tag == type(of: tagRender).tag }) {
            supportedTagRenders.remove(at: index)
        }
        supportedTagRenders.append(tagRender)
    }
    
    /// Parse and generate attributed string.
    func parse(string: String) throws -> NSAttributedString {
        var xmlString = HTMLToAttributedStringParser.escapeWithUnicodeEntities(string)
        
        // make sure <br/> format is correct XML
        // because Web may use <br> to present <br/>, but <br> is not a valid XML
        xmlString = xmlString.replacingOccurrences(of: "<br>", with: "<br/>")
        
        let xml = "<\(HTMLToAttributedStringParser.topTag)>\(xmlString)</\(HTMLToAttributedStringParser.topTag)>"
        guard let data = xml.data(using: String.Encoding.utf8) else {
            throw XMLParserInitError("Unable to convert to UTF8")
        }
        
        let xmlParser = XMLParser(data: data)
        xmlParser.shouldProcessNamespaces = false
        xmlParser.shouldReportNamespacePrefixes = false
        xmlParser.shouldResolveExternalEntities = false
        xmlParser.delegate = self
        self.xmlParser = xmlParser
        
        attributedString = NSMutableAttributedString()
        
        guard xmlParser.parse() else {
            let line = xmlParser.lineNumber
            let shiftColumn = (line == 1)
            let shiftSize = HTMLToAttributedStringParser.topTag.lengthOfBytes(using: String.Encoding.utf8) + 2
            let column = xmlParser.columnNumber - (shiftColumn ? shiftSize : 0)
            
            throw XMLParserError(parserError: xmlParser.parserError, line: line, column: column)
        }
        
        return attributedString
    }
}

// MARK: Private Method

private extension HTMLToAttributedStringParser {
    func enter(element elementName: String, attributes: [String: String]) {
        // elementName = tagName, EX: a,span,div...
        guard elementName != HTMLToAttributedStringParser.topTag else {
            return
        }
        
        if let index = supportedTagRenders.firstIndex(where: { type(of: $0).tag == elementName }) {
            var tagRender = supportedTagRenders[index]
            tagRender.storedHTMLAttributes = attributes
            renderingTagRenders.append(tagRender)
        }
    }
    
    func exit(element elementName: String) {
        if !renderingTagRenders.isEmpty {
            renderingTagRenders.removeLast()
        }
    }
    
    func foundNewString() {
        if let currentString = currentString {
            // currentString != nil ,ex: <i>currentString</i>
            var newAttributedString = NSMutableAttributedString(string: currentString)
            if !renderingTagRenders.isEmpty {
                for (key, tagRender) in renderingTagRenders.enumerated() {
                    // Render Style
                    tagRender.render(attributedString: &newAttributedString)
                    renderingTagRenders[key].storedHTMLAttributes = nil
                }
            } else {
                defaultStyle.render(attributedString: &newAttributedString)
            }
            attributedString.append(newAttributedString)
            self.currentString = nil
        } else {
            // currentString == nil ,ex: <br/>
            var newAttributedString = NSMutableAttributedString()
            for (key, tagRender) in renderingTagRenders.enumerated() {
                // Render Style
                tagRender.render(attributedString: &newAttributedString)
                renderingTagRenders[key].storedHTMLAttributes = nil
            }
            attributedString.append(newAttributedString)
        }
    }
}

// MARK: Helper

extension HTMLToAttributedStringParser {
    // handle html entity / html hex
    // Perform string escaping to replace all characters which is not supported by NSXMLParser
    // into the specified encoding with decimal entity.
    // For example if your string contains '&' character parser will break the style.
    // This option is active by default.
    // ref: https://github.com/malcommac/SwiftRichString/blob/e0b72d5c96968d7802856d2be096202c9798e8d1/Sources/SwiftRichString/Support/XMLStringBuilder.swift
    static func escapeWithUnicodeEntities(_ string: String) -> String {
        guard let escapeAmpRegExp = try? NSRegularExpression(pattern: "&(?!(#[0-9]{2,4}|[A-z]{2,6});)", options: NSRegularExpression.Options(rawValue: 0)) else {
            return string
        }
        
        let range = NSRange(location: 0, length: string.count)
        return escapeAmpRegExp.stringByReplacingMatches(in: string,
                                                        options: NSRegularExpression.MatchingOptions(rawValue: 0),
                                                        range: range,
                                                        withTemplate: "&amp;")
    }
}

// MARK: XMLParserDelegate

extension HTMLToAttributedStringParser: XMLParserDelegate {
    func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String: String]) {
        foundNewString()
        enter(element: elementName, attributes: attributeDict)
    }
    
    func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
        foundNewString()
        guard elementName != HTMLToAttributedStringParser.topTag else {
            return
        }
        
        exit(element: elementName)
    }
    
    func parser(_ parser: XMLParser, foundCharacters string: String) {
        currentString = (currentString ?? "").appending(string)
    }
}

Applying the logic of Strip, we can combine the parsed structure by knowing the current Tag from elementName and applying the corresponding Tag Parser and defined Style.

Test Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
let test = "我<br/><a href=\"http://google.com\">同意</a>提供<b><i>個</i>人</b>身分證字號/護照/居留<span style=\"color:#FF0000;font-size:20px;word-spacing:10px;line-height:10px\">證號碼</span>,以供<i>跨境物流</i>方通關<span style=\"background-color:#00FF00;\">使用</span>,並已<img src=\"g.png\"/>了解跨境<br/>商品之物<p>流需</p>求"
let render = HTMLToAttributedStringParser(defaultStyle: DefaultTextStyle())
render.register(ATagParser())
render.register(BoldTagParser())
render.register(SpanTagParser())
//...
print(try! render.parse(string: test))

// Result:
// 我{
//     NSColor = "UIExtendedGrayColorSpace 0 1";
//     NSFont = "\".SFNS-Regular 14.00 pt. P [] (0x13a012970) fobj=0x13a012970, spc=3.79\"";
//     NSParagraphStyle = "Alignment 4, LineSpacing 3, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n    28L,\n    56L,\n    84L,\n    112L,\n    140L,\n    168L,\n    196L,\n    224L,\n    252L,\n    280L,\n    308L,\n    336L\n), DefaultTabInterval 0, Blocks (\n), Lists (\n), BaseWritingDirection -1, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0 LineBreakStrategy 0 PresentationIntents (\n) ListIntentOrdinal 0 CodeBlockIntentLanguageHint ''";
// }同意{
//     NSColor = "UIExtendedSRGBColorSpace 0 0 1 1";
//     NSFont = "\".SFNS-Regular 14.00 pt. P [] (0x13a012970) fobj=0x13a012970, spc=3.79\"";
//     NSLink = "http://google.com";
//     NSUnderline = 1;
// }提供{
//     NSColor = "UIExtendedGrayColorSpace 0 1";
//     NSFont = "\".SFNS-Regular 14.00 pt. P [] (0x13a012970) fobj=0x13a012970, spc=3.79\"";
//     NSParagraphStyle = "Alignment 4, LineSpacing 3, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n    28L,\n    56L,\n    84L,\n    112L,\n    140L,\n    168L,\n    196L,\n    224L,\n    252L,\n    280L,\n    308L,\n    336L\n), DefaultTabInterval 0, Blocks (\n), Lists (\n), BaseWritingDirection -1, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0 LineBreakStrategy 0 PresentationIntents (\n) ListIntentOrdinal 0 CodeBlockIntentLanguageHint ''";
// }個{
//     NSColor = "UIExtendedGrayColorSpace 0 1";
//     NSFont = "\".SFNS-Bold 14.00 pt. P [] (0x13a013870) fobj=0x13a013870, spc=3.46\"";
//     NSUnderline = 1;
// }人身分證字號/護照/居留{
//     NSColor = "UIExtendedGrayColorSpace 0 1";
//     NSFont = "\".SFNS-Regular 14.00 pt. P [] (0x13a012970) fobj=0x13a012970, spc=3.79\"";
//     NSParagraphStyle = "Alignment 4, LineSpacing 3, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n    28L,\n    56L,\n    84L,\n    112L,\n    140L,\n    168L,\n    196L,\n    224L,\n    252L,\n    280L,\n    308L,\n    336L\n), DefaultTabInterval 0, Blocks (\n), Lists (\n), BaseWritingDirection -1, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0 LineBreakStrategy 0 PresentationIntents (\n) ListIntentOrdinal 0 CodeBlockIntentLanguageHint ''";
// }證號碼{
//     NSColor = "UIExtendedSRGBColorSpace 1 0 0 1";
//     NSFont = "\".SFNS-Regular 20.00 pt. P [] (0x13a015fa0) fobj=0x13a015fa0, spc=4.82\"";
//     NSKern = 10;
//     NSParagraphStyle = "Alignment 4, LineSpacing 10, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n    28L,\n    56L,\n    84L,\n    112L,\n    140L,\n    168L,\n    196L,\n    224L,\n    252L,\n    280L,\n    308L,\n    336L\n), DefaultTabInterval 0, Blocks (\n), Lists (\n), BaseWritingDirection -1, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0 LineBreakStrategy 0 PresentationIntents (\n) ListIntentOrdinal 0 CodeBlockIntentLanguageHint ''";
// },以供跨境物流方通關{
//     NSColor = "UIExtendedGrayColorSpace 0 1";
//     NSFont = "\".SFNS-Regular 14.00 pt. P [] (0x13a012970) fobj=0x13a012970, spc=3.79\"";
//     NSParagraphStyle = "Alignment 4, LineSpacing 3, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n    28L,\n    56L,\n    84L,\n    112L,\n    140L,\n    168L,\n    196L,\n    224L,\n    252L,\n    280L,\n    308L,\n    336L\n), DefaultTabInterval 0, Blocks (\n), Lists (\n), BaseWritingDirection -1, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0 LineBreakStrategy 0 PresentationIntents (\n) ListIntentOrdinal 0 CodeBlockIntentLanguageHint ''";
// }使用{
//     NSBackgroundColor = "UIExtendedSRGBColorSpace 0 1 0 1";
//     NSColor = "UIExtendedGrayColorSpace 0 1";
//     NSFont = "\".SFNS-Regular 14.00 pt. P [] (0x13a012970) fobj=0x13a012970, spc=3.79\"";
//     NSParagraphStyle = "Alignment 4, LineSpacing 3, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n    28L,\n    56L,\n    84L,\n    112L,\n    140L,\n    168L,\n    196L,\n    224L,\n    252L,\n    280L,\n    308L,\n    336L\n), DefaultTabInterval 0, Blocks (\n), Lists (\n), BaseWritingDirection -1, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0 LineBreakStrategy 0 PresentationIntents (\n) ListIntentOrdinal 0 CodeBlockIntentLanguageHint ''";
// },並已了解跨境商品之物流需求{
//     NSColor = "UIExtendedGrayColorSpace 0 1";
//     NSFont = "\".SFNS-Regular 14.00 pt. P [] (0x13a012970) fobj=0x13a012970, spc=3.79\"";
//     NSParagraphStyle = "Alignment 4, LineSpacing 3, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n    28L,\n    56L,\n    84L,\n    112L,\n    140L,\n    168L,\n    196L,\n    224L,\n    252L,\n    280L,\n    308L,\n    336L\n), DefaultTabInterval 0, Blocks (\n), Lists (\n), BaseWritingDirection -1, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0 LineBreakStrategy 0 PresentationIntents (\n) ListIntentOrdinal 0 CodeBlockIntentLanguageHint ''";
// }

Display Result:

Done!

We have now completed implementing the HTML Render function through XMLParser, maintaining both extensibility and specification. This allows us to manage and understand the types of string rendering supported by the current App from the code.

Complete Github Repo as follows

This article is also published on my personal Blog: [Click here to visit].

For any questions or feedback, feel free to contact me.

===

本文中文版本

===

This article was first published in Traditional Chinese on Medium ➡️ View Here


This post is licensed under CC BY 4.0 by the author.

Converting Medium Posts to Markdown

Visitor Pattern in TableView