delphi html编码,delphi – 来自TWebBrowser的HTML源代码 – 如何检测流编码?
如果我使用具有Unicode代码页的html页面运行this code,则结果是乱码,因为在D7中TStringStream不是Unicode.页面可能是UTF8编码或其他(Ansi)代码页编码.如何检测TStream / IPersistStreamInit是否为Unicode / UTF8 / Ansi?我如何始终为此函数返回正确的WideString结果?function GetWebBro
如果我使用具有Unicode代码页的html页面运行this code,则结果是乱码,因为在D7中TStringStream不是Unicode.页面可能是UTF8编码或其他(Ansi)代码页编码.
如何检测TStream / IPersistStreamInit是否为Unicode / UTF8 / Ansi?
我如何始终为此函数返回正确的WideString结果?function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;
如果我用TMemoryStream替换TStringStream,并将TMemoryStream保存到文件中就可以了.它可以是Unicode / UTF8 / Ansi.但我总是希望以WideString的形式返回流:function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;
var
// LStream: TStringStream;
LStream: TMemoryStream;
Stream : IStream;
LPersistStreamInit : IPersistStreamInit;
begin
if not Assigned(WebBrowser.Document) then exit;
// LStream := TStringStream.Create('');
LStream := TMemoryStream.Create;
try
LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
Stream := TStreamAdapter.Create(LStream,soReference);
LPersistStreamInit.Save(Stream,true);
// result := LStream.DataString;
LStream.SaveToFile('c:\test\test.txt'); // test only - file is ok
Result := ??? // WideString
finally
LStream.Free();
end;
end;
这完全符合我的需要.但它仅适用于Delphi Unicode编译器(D2009).阅读Conclusion部分:There is obviously a lot more we could do. A couple of things
immediately spring to mind. We retro-fit some of the Unicode
functionality and support for non-ANSI encodings to the pre-Unicode
compiler code. The present code when compiled with anything earlier
than Delphi 2009 will not save document content to strings correctly
if the document character set is not ANSI.
魔术显然是在TEncoding类(TEncoding.GetBufferEncoding)中.但是D7没有TEncoding.有任何想法吗?
最佳答案 我使用
GpTextStream来处理转换(应该适用于所有Delphi版本):function GetCodePageFromHTMLCharSet(Charset: WideString): Word;
const
WIN_CHARSET = 'windows-';
ISO_CHARSET = 'iso-';
var
S: string;
begin
Result := 0;
if Charset = 'unicode' then
Result := CP_UNICODE else
if Charset = 'utf-8' then
Result := CP_UTF8 else
if Pos(WIN_CHARSET, Charset) <> 0 then
begin
S := Copy(Charset, Length(WIN_CHARSET) + 1, Maxint);
Result := StrToIntDef(S, 0);
end else
if Pos(ISO_CHARSET, Charset) <> 0 then // ISO-8859 (e.g. iso-8859-1: => 28591)
begin
S := Copy(Charset, Length(ISO_CHARSET) + 1, Maxint);
S := Copy(S, Pos('-', S) + 1, 2);
if S = '15' then // ISO-8859-15 (Latin 9)
Result := 28605
else
Result := StrToIntDef('2859' + S, 0);
end;
end;
function GetWebBrowserHTML(WebBrowser: TWebBrowser): WideString;
var
LStream: TMemoryStream;
Stream: IStream;
LPersistStreamInit: IPersistStreamInit;
TextStream: TGpTextStream;
Charset: WideString;
Buf: WideString;
CodePage: Word;
N: Integer;
begin
Result := '';
if not Assigned(WebBrowser.Document) then Exit;
LStream := TMemoryStream.Create;
try
LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
Stream := TStreamAdapter.Create(LStream, soReference);
if Failed(LPersistStreamInit.Save(Stream, True)) then Exit;
Charset := (WebBrowser.Document as IHTMLDocument2).charset;
CodePage := GetCodePageFromHTMLCharSet(Charset);
N := LStream.Size;
SetLength(Buf, N);
TextStream := TGpTextStream.Create(LStream, tsaccRead, [], CodePage);
try
N := TextStream.Read(Buf[1], N * SizeOf(WideChar)) div SizeOf(WideChar);
SetLength(Buf, N);
Result := Buf;
finally
TextStream.Free;
end;
finally
LStream.Free();
end;
end;
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)