如果我使用具有Unicode代码页的html页面运行this code,则结果是乱码,因为在D7中TStringStream不是Unicode.页面可能是UTF8编码或其他(Ansi)代码页编码.

如何检测TStream / IPersistStreamInit是否为Unicode / UTF8 / Ansi?

我如何始终为此函数返回正确的WideString结果?function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;

如果我用TMemoryStream替换TStringStream,并将TMemoryStream保存到文件中就可以了.它可以是Unicode / UTF8 / Ansi.但我总是希望以WideString的形式返回流:function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;

var

// LStream: TStringStream;

LStream: TMemoryStream;

Stream : IStream;

LPersistStreamInit : IPersistStreamInit;

begin

if not Assigned(WebBrowser.Document) then exit;

// LStream := TStringStream.Create('');

LStream := TMemoryStream.Create;

try

LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;

Stream := TStreamAdapter.Create(LStream,soReference);

LPersistStreamInit.Save(Stream,true);

// result := LStream.DataString;

LStream.SaveToFile('c:\test\test.txt'); // test only - file is ok

Result := ??? // WideString

finally

LStream.Free();

end;

end;

这完全符合我的需要.但它仅适用于Delphi Unicode编译器(D2009).阅读Conclusion部分:There is obviously a lot more we could do. A couple of things

immediately spring to mind. We retro-fit some of the Unicode

functionality and support for non-ANSI encodings to the pre-Unicode

compiler code. The present code when compiled with anything earlier

than Delphi 2009 will not save document content to strings correctly

if the document character set is not ANSI.

魔术显然是在TEncoding类(TEncoding.GetBufferEncoding)中.但是D7没有TEncoding.有任何想法吗?

最佳答案 我使用

GpTextStream来处理转换(应该适用于所有Delphi版本):function GetCodePageFromHTMLCharSet(Charset: WideString): Word;

const

WIN_CHARSET = 'windows-';

ISO_CHARSET = 'iso-';

var

S: string;

begin

Result := 0;

if Charset = 'unicode' then

Result := CP_UNICODE else

if Charset = 'utf-8' then

Result := CP_UTF8 else

if Pos(WIN_CHARSET, Charset) <> 0 then

begin

S := Copy(Charset, Length(WIN_CHARSET) + 1, Maxint);

Result := StrToIntDef(S, 0);

end else

if Pos(ISO_CHARSET, Charset) <> 0 then // ISO-8859 (e.g. iso-8859-1: => 28591)

begin

S := Copy(Charset, Length(ISO_CHARSET) + 1, Maxint);

S := Copy(S, Pos('-', S) + 1, 2);

if S = '15' then // ISO-8859-15 (Latin 9)

Result := 28605

else

Result := StrToIntDef('2859' + S, 0);

end;

end;

function GetWebBrowserHTML(WebBrowser: TWebBrowser): WideString;

var

LStream: TMemoryStream;

Stream: IStream;

LPersistStreamInit: IPersistStreamInit;

TextStream: TGpTextStream;

Charset: WideString;

Buf: WideString;

CodePage: Word;

N: Integer;

begin

Result := '';

if not Assigned(WebBrowser.Document) then Exit;

LStream := TMemoryStream.Create;

try

LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;

Stream := TStreamAdapter.Create(LStream, soReference);

if Failed(LPersistStreamInit.Save(Stream, True)) then Exit;

Charset := (WebBrowser.Document as IHTMLDocument2).charset;

CodePage := GetCodePageFromHTMLCharSet(Charset);

N := LStream.Size;

SetLength(Buf, N);

TextStream := TGpTextStream.Create(LStream, tsaccRead, [], CodePage);

try

N := TextStream.Read(Buf[1], N * SizeOf(WideChar)) div SizeOf(WideChar);

SetLength(Buf, N);

Result := Buf;

finally

TextStream.Free;

end;

finally

LStream.Free();

end;

end;

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐