一尘不染

在Go中解析来自文本文件的HTTP请求和响应

go

给定以下文件,该文件包含HTTP请求和HTTP响应的HTTP流水线流。

如何将该文件解析为stream变量?

type Connection struct{
   Request *http.Request
   Response *http.Response
}
stream := make([]Connection, 0)

原始文件:

GET /ubuntu/dists/trusty/InRelease HTTP/1.1
Host: archive.ubuntu.com
Cache-Control: max-age=0
Accept: text/*
User-Agent: Debian APT-HTTP/1.3 (1.0.1ubuntu2)

HTTP/1.1 404 Not Found
Date: Thu, 26 Nov 2015 18:26:36 GMT
Server: Apache/2.2.22 (Ubuntu)
Vary: Accept-Encoding
Content-Length: 311
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /ubuntu/dists/trusty/InRelease was not found on this server.</p>
<hr>
<address>Apache/2.2.22 (Ubuntu) Server at archive.ubuntu.com Port 80</address>
</body></html>
GET /ubuntu/dists/trusty-updates/InRelease HTTP/1.1
Host: archive.ubuntu.com
Cache-Control: max-age=0
Accept: text/*
User-Agent: Debian APT-HTTP/1.3 (1.0.1ubuntu2)

HTTP/1.1 200 OK
Date: Thu, 26 Nov 2015 18:26:37 GMT
Server: Apache/2.2.22 (Ubuntu)
Last-Modified: Thu, 26 Nov 2015 18:03:00 GMT
ETag: "fbb7-5257562a5fd00"
Accept-Ranges: bytes
Content-Length: 64439
Cache-Control: max-age=382, proxy-revalidate
Expires: Thu, 26 Nov 2015 18:33:00 GMT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Origin: Ubuntu
Label: Ubuntu
Suite: trusty-updates
Version: 14.04
Codename: trusty
[... truncated by author]

我知道有http.ReadRequest。那回应呢?任何想法/反馈/想法表示赞赏。


阅读 172

收藏
2020-07-02

共1个答案

一尘不染

实际上非常简单:

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "net/http/httputil"
    "os"
)

type Connection struct {
    Request  *http.Request
    Response *http.Response
}

func ReadHTTPFromFile(r io.Reader) ([]Connection, error) {
    buf := bufio.NewReader(r)
    stream := make([]Connection, 0)

    for {
        req, err := http.ReadRequest(buf)
        if err == io.EOF {
            break
        }
        if err != nil {
            return stream, err
        }

        resp, err := http.ReadResponse(buf, req)
        if err != nil {
            return stream, err
        }

        //save response body
        b := new(bytes.Buffer)
        io.Copy(b, resp.Body)
        resp.Body.Close()
        resp.Body = ioutil.NopCloser(b)

        stream = append(stream, Connection{Request: req, Response: resp})
    }
    return stream, nil

}
func main() {
    f, err := os.Open("/tmp/test.http")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()
    stream, err := ReadHTTPFromFile(f)
    if err != nil {
        log.Fatalln(err)
    }
    for _, c := range stream {
        b, err := httputil.DumpRequest(c.Request, true)
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println(string(b))
        b, err = httputil.DumpResponse(c.Response, true)
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println(string(b))
    }
}

一些注意事项:

  • http.ReadRequesthttp.ReadResponse
  • http.ReadRequesthttp.ReadResponse可以在相同的位置上反复调用,bufio.Reader直到EOF它“正常工作”
    • “正常工作”取决于Content-Length标头的存在和正确性,因此读取正文会将Reader置于下一个请求/响应的开始
    • 阅读代码以确切了解哪些将起作用,哪些将不起作用
  • resp.Body必须根据Close文档进行编辑,因此我们必须将其复制到另一个缓冲区以保留它
  • 使用您的示例数据(修改Content-Length以匹配您的截断),此代码将输出与给定相同的请求和响应
  • httputil.DumpRequest并且httputil.DumpResponse不一定会以与输入文件相同的顺序转储HTTP标头,因此不要指望a diff是完美的
2020-07-02