2010. február 24., szerda
How to isolate text between two HTML tags
Problem/Question/Abstract:
I have a TRichEdit.Lines (TStrings) where I want to extract a string and copy it to another string. I use ScanF to find begining of string which is '<a href' and almost end of string which is '</ a>'. Then I need to find either next '<' or end of Line. Once I do all this, how do I extract this string and copy it to another string?
Answer:
See the Copy function. Perhaps the following routine can be of use for you, it uses the diverse PChar-based string functions instead of the standard String Pos and Copy, basically because it is a bit easier in this case to work with pointers.
procedure IsolateTextBetweentags(const S: string; Tag1, Tag2: string; list: TStrings);
var
pScan, pEnd, pTag1, pTag2: PChar;
foundText: string;
searchtext: string;
begin
{Set up pointers we need for the search. HTML is not case sensitive, so
we need to perform the search on a uppercased copy of S}
searchtext := Uppercase(S);
Tag1 := Uppercase(Tag1);
Tag2 := Uppercase(Tag2);
pTag1 := PChar(Tag1);
pTag2 := PChar(Tag2);
pScan := PChar(searchtext);
repeat
{Search for next occurence of Tag1}
pScan := StrPos(pScan, pTag1);
if pScan <> nil then
begin
{Found one, hop over it, then search from that position forward for the
next occurence of Tag2}
Inc(pScan, Length(Tag1));
pEnd := StrPos(pScan, pTag2);
if pEnd <> nil then
begin
{Found start and end tag, isolate text between, add it to the list. We need to
get the text from the original S, however, since we
want the un-uppercased version!}
SetString(foundText, Pchar(S) + (pScan - PChar(searchtext)), pEnd - pScan);
list.Add(foundText);
{Continue next search after the found end tag}
pScan := pEnd + Length(tag2);
end
else
{Error, no end tag found for start tag, abort}
pScan := nil;
end;
until
pScan = nil;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
with opendialog1 do
begin
filter := 'HTML files|*.HTM; *.HTML';
if execute then
begin
richedit1.PlainText := true;
richedit1.lines.loadfromfile(filename);
memo2.clear;
IsolateTextBetweenTags(richedit1.text, '<H1>', '</H1>', memo2.lines);
end;
end;
end;
Feliratkozás:
Megjegyzések küldése (Atom)
Nincsenek megjegyzések:
Megjegyzés küldése