一尘不染

如何在Swift中解码HTML实体?

swift

我正在从网站提取JSON文件,收到的字符串之一是:

The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi

如何将类似的内容&#8216转换为正确的字符?

我创建了一个Xcode Playground来演示它:

import UIKit

var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)

let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary

var a = dataDictionary["posts"] as NSArray

println(a[0]["title"])

阅读 300

收藏
2020-07-07

共1个答案

一尘不染

这个答案最近针对Swift 5.2和iOS 13.4 SDK进行了修订。


没有做到这一点的直接方法,但是您可以使用NSAttributedString魔术使此过程尽可能轻松(请注意,此方法也将剥离所有HTML标记)。

记住 仅从主线程 初始化NSAttributedString。它使用WebKit解析下面的HTML,因此是必需的。

// This is a[0]["title"] in your case
let encodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"

guard let data = htmlEncodedString.data(using: .utf8) else {
    return
}

let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
    .documentType: NSAttributedString.DocumentType.html,
    .characterEncoding: String.Encoding.utf8.rawValue
]

guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
    return
}

// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string



extension String {

    init?(htmlEncodedString: String) {

        guard let data = htmlEncodedString.data(using: .utf8) else {
            return nil
        }

        let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue
        ]

        guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
            return nil
        }

        self.init(attributedString.string)

    }

}

let encodedString = "The Weeknd <em>&#8216;King Of The Fall&#8217;</em>"
let decodedString = String(htmlEncodedString: encodedString)
2020-07-07