|
||
Bmh.PAS |
||
{* Поиск строки по Бойеру-Муру *}Unit BMH;Interface Implementation Begin writeln('Поиск строки по Бойеру-Муру'); End. Program pas; Uses Bmh.pas; {* Эту строку можно удалить *} unit Bmh; { BMH 1.11a, Copyright (c) 1995, by Jody R. Cairns (jodyc@cs.mun.ca) This unit implements the Boyer - Moore - Horspool pattern searching algorithm as taken from the 'Handbook Of Algorithms And Data Structures In Pascal And C', Second Edition, by G.H Gonnet And R. Baeza - Yates. The algorithm searches For a pattern within a buffer. I have added functions To implement the searches with files Of any Type. {* Используемые типы *} Implementation : I designed this unit For two types Of users - those who want complete control Of their code (Non - lazy), And those who don't (Lazy). "Non - lazy" users have To manually Set some variables that are essential For the algorithm. "Lazy" users don't have To Do anything except call a Function . 1) Lazy : the following Function returns the offset In which the specified String is found with the specified File. A value Of - 1 is returned If the String is Not found: Function GetStringOffset (StrToFind : String; Const {* Предопределенные *} FileName : TFileName; Const {* Константы *} IgnoreCase : boolean) : longint; - StrToFind : the String you are looking For . - FileName : the name Of the File To search For StrToFind. - IgnoreCase : indicates whether a Case - sensitive search is done Or Not. The global variable IgnoreCase is true by default. Examples : a) IgnoreCase := true; If GetStringOffset('path', 'C:AUTOEXEC.BAT', IgnoreCase) - 1 Then Text1.Caption := 'Found expression' Else Test1.Caption := 'Expression Not found'; b) If GetStringOffset('PROMPT', 'C:AUTOEXEC.BAT', false) - 1 Then Text1.Caption := 'Found expression' Else Test1.Caption := 'Expression Not found'; 2) Non - lazy : you should see the Function GetStringOffset For everything that needs To be done before FindString is called, which is the main Function that opens the File For searching. GetStringOffset creates a buffer each time it is called. However, you need only they Do this once. Also, the Procedure MakeBMHTable must be called For each DIFFERENT String To be searched For . You needn't call MakeBMTable everytime the same String is searched For . REQUIREMENTS : a) Call CreateBuffer which allocates memory For the buffer To be used whe reading files. The buffer is a global variable called Buffer Of Type {* Создание новых типов данных *} TSearchBuffer. CreateBuffer returns the amount Of memory allocated For Buffer. b) If you Do Not want To Do a Case - insensitive search, Set the global variable IgnoreCase To FALSE. By default, IgnoreCase is TRUE, which means all searches are Case - insensitive. If you choose a Case - insensitive search, make sure your String is converted To uppercase! That is, Do this : MyString := uppercase(MyString); c) Call MakeBMHTable(MyString) To create the index table For the String To be searched For (MyString). This MUST be called For every DIFFERENT String To be searched For . However, it need only be called ONCE For each different String. The index table is a global variable called BMHTable Of Type {* Типы переменных *} TBMHTable. If you're doing a Case - insensitive search, MyString MUST be uppercased BEFORE MakeBMHTable is called. d) Call FindString(MyString, MyFile) To search For MyString within the File MyFile. FindString returns the offset position Of MyString within MyFile If it is found; otherwise it returns - 1. A REMINDER: If you're doing a Case - insensitive search, make sure MyString is converted To uppercase! e) Call DestroyBuffer To free the memory that Buffer points To. Note that I only Do (a), (b) And (c) ONCE. Once I search For a different String, Then I MUST Do (c) again. And (e) need only be called once after ALL searching is completed. Summary : a) Allocate memory For Buffer by calling CreateBuffer. b) Convert MyString (String I want To find) To uppercase If I'm doing a Case - insensitive search, which is the default. Otherwise, Set IgnoreCase := false And leave MyString alone. c) Call MakeBMHTable with MyString. d) Call FindString with MyString And name Of File. e) Remember To release memory Buffer is pointing To by calling DestroyBuffer. VERSION CHANGES : 1.11a - removed second boolean condition from Repeat - Until loop In Function DOBMHSearch, which increases execution speed. I should have done this In version 1.10. 1.11 - added an important comment about Case - insensitive searches that was Not mentioned In previous versions: For "non - lazy" users, If you are doing a Case - insensitive search, make sure the String you are searching For (i.e. that is passed In Function FindString) is converted To uppercase; otherwise, the search may fail. 1.10 - improved Function DOBMHSearch execution speed by replacing inner While statement with a For - loop And a Goto statement. - improved Function FindString execution speed by adding BREAK statement If pattern was found. - added Case - insensitive search option. The global variable IgnoreCase was added To indicate the search Type {* Типы переменных *} To be performed, And Procedure UpCaseBuffer was added. 1.01 - added additional explanatory comments - added a couple more error strings To the Function GetError 1.00 - original release NOTES : - If you have ANY questions, problems Or suggestions, please feel free To contact me at jodyc@cs.mun.ca - various code optimizations can be made To improve speed. - minimal error - checking is performed. I would add more To suit your own particular needs. - all the routines In this unit could be gathered into an object Of some sort. I may Do that later. - To search Read - Only files, you should Set system.filemode := 0 before FindString is called; otherwise, FindString will fail. - currently, the algorithm only finds the first occurrence Of a pattern within a File. I plan To extend this To search For ALL occurrences. This unit is FreeWare. You may use it freely at your own risk. Being FreeWare, this unit is Not To be sold at any charge, whether it is used alone Or incorporated into any Program . Please feel free To add any enhancements Or modifications. If you Do , just add your credits along with mine. And I'd be interested In any modifications you Do make. Any enhancement / modification must also be released as Freeware. Jody R. Cairns jodyc@cs.mun.ca } {$Q - , I + , R - , S - , B - , V - , D - , L - } interface Uses SysUtils; {* Подключаем внешние файлы *} Uses Bmh.pas; {* Эту строку можно удалить *} {* Вызов внешних функций *} {* Подключаем внешние файлы *} {* Подключение модулей *} {* Подключаем внешние файлы *} Const {* Предопределенные *} MaxBufferSize = 1024 * 63; { Maximum size Of buffer } Type {* Используемые типы *} TBMHTable = Array[0..255] Of byte; TSearchBuffer = ^TSearchBufferArray; TSearchBufferArray = Array[1..MaxBufferSize] Of char; Function CreateBuffer : word; Procedure DestroyBuffer; Procedure MakeBMHTable ( Const {* Константы *} StrToFind : String); Function FindString ( Const {* Константы *} StrToFind : String; Const {* Предопределенные *} FileName : TFileName) : longint; Function GetStringOffset (StrToFind : String; Const {* Постоянные значения *} FileName : TFileName; Const {* Константы *} IgnoreCase : boolean) : longint; Const {* Константы *} IgnoreCase : boolean = true; { determines whether To Do Case - insensitive search Or Not } Var {* Объявление переменных *} BMHTable : TBMHTable; { index table required For B - M - H algorithm } Buffer : TSearchBuffer; { buffer used when reading File } { NOTES : - I use no local variables within procedures And functions. Local variables tend To slow execution too much For my taste, since most local variables have To be created on the system stack each time a Function is called. } Var {* Объявление переменных *} FileToSearch : File; { File To search For given String } OldFileMode : byte; { saves the File mode access code } K : integer; { number Of bytes read into buffer For blockread } OldErrorCode : word; { saves Windows critical error - handling mode } Var {* Необходимые переменные *} Buffer : TSearchBufferArray; Const {* Константы *} Size : word); assembler; { Converts all lower - Case characters within Buffer To upper - Case } asm mov cx, Size { Load size Of Buffer } jcxz @3 { Exit If size = 0 } les di, Buffer { Load Buffer } @1 : mov al, es : [di] { Check current byte Of Buffer } cmp al, 'a' { Skip If Not 'a'..'z' } jb @2 cmp al, 'z' ja @2 sub al, 20h { Convert To uppercase } mov es : [di], al { Put converted byte back In Buffer } @2 : inc di { Get next byte In Buffer } loop @1 { Continue To size Of Buffer } @3: End; Function GetError ( Const {* Постоянные значения *} ErrorCode : integer) : String; { Returns a String pertaining To the Type {* Используемые типы *} Of error. ErrorCode can be taken from IOResult If IO - checking is off, Or from an exception handler. The strings listed below are taken from Borland's 'Object Pascal Language Guide' For Delphi Version 1.0, pages 273 - 275. } Begin Case ErrorCode Of 2 : Result := 'File Not found'; 3 : Result := 'Path Not found'; 4 : Result := 'Too many open files'; 5 : Result := 'File access denied'; 6 : Result := 'Invalid File handle'; 12 : Result := 'Invalid File access code'; 15 : Result := 'Invalid drive'; 100 : Result := 'Disk read error'; 101 : Result := 'Disk write error'; 102 : Result := 'File Not assigned'; 103 : Result := 'File Not open'; Else Result := '' End End; Function DoBMHSearch( Const {* Предопределенные *} StrToFind : String) : longint; { Performs the Boyer - Moore - Horspool String searching algorithm, returning the offset In buffer where the String was found. If Not found, Then - 1 is returned. Adapted from the 'Handbook Of Algorithms And Data Structures In Pascal And C', Second Edition, by G.H Gonnet And R. Baeza - Yates. } label NotFound; { using a Goto statement improves speed } Begin Result := - 1; J := length(StrToFind); While (J = MaxBufferSize Then Result := MaxBufferSize Else Result := (MaxAvail Div 1024) * 1024; try { allocate memory } getmem (Buffer, Result) except Result := 0 End { allocate memory } End; Procedure DestroyBuffer; { Free the memory that Buffer points To } Begin freemem(Buffer, sizeof(Buffer^)) End; Function FindString ( Const {* Константы *} StrToFind : String; Const {* Постоянные значения *} FileName : TFileName) : longint; { Opens File To initiate Boyer - Moore - Horspool search algorithm, reading blocks Of data from File Until String is found Or all bytes are read. The offset within FileName is returned If StrToFind is found; otherwise, - 1 is returned. Note that the offset is zero - based. The first byte In a File is at offset 0. The second byte is at offset 1. Etc. * * * * BEFORE Function IS CALLED * * * * : 1) a variable called Buffer Of Type {* Создание новых типов данных *} TSearchBuffer MUST exist with a size greater than zero (0). NO error - checking is done on this. 2) a variable called BMHTable Of Type {* Используемые типы *} TBMHTable must exist And be initialized For the String StrToFind using Procedure MakeBMHTable. 3) If IgnoreCase is true (i.e. you are doing a Case - insensitive search), make sure StrToFind is converted To uppercase. } Begin Result := - 1; assignfile (FileToSearch, FileName); try { To open File To search } reset(FileToSearch, 1); {* Чтение из файла через переменную FileToSearch, 1 *} try { searching For String } Repeat blockread(FileToSearch, Buffer^, sizeof(Buffer^), BytesRead); { Convert all appropiate chars To uppercase If search is Case - insensitive. } If IgnoreCase Then UpCaseBuffer(Buffer^, BytesRead); { Search For String within buffer } Result := DoBMHSearch(StrToFind); { Calculate offset position In File If found } If Result - 1 Then Begin Result := filepos(FileToSearch) - Result; { Adding the following statement improves speed because the Until condition is Not evaluated. } break End; { If Buffer is full, skip back length(StrToFind) bytes In File. This ensures any pattern isn't "cut - off" during readblock. } If BytesRead = sizeof(Buffer^) Then seek(FileToSearch, filepos(FileToSearch) - length(StrToFind)) Until (BytesRead = 0); finally closefile(FileToSearch) End; { searching For String } except on E : EInOutError Do Begin MessageDlg('Cannot scan ' + uppercase(FileName) + '.'#13 + GetError(E.ErrorCode) + '.', mterror,[mbOK], 0); Result := - 1 End End { opening File To search } End; Function GetStringOffset (StrToFind : String; Const {* Предопределенные *} FileName : TFileName; Const {* Предопределенные *} IgnoreCase : boolean) : longint; { This is For you "lazy" programmers. This Function does all initialization routines To find StrToFind within FileName. If StrToFind is found, the offset location within FileName is returned; otherwise, - 1 is returned, indicating an unsuccessful search. } Begin { try To create buffer For blockread Procedure } If CreateBuffer = 0 Then Begin MessageDlg('Not enough memory To perform search.', mtWarning,[mbOK], 0); Result := - 1; exit End; { Convert To uppercase For Case - insensitive searching } If IgnoreCase Then StrToFind := uppercase(StrToFind); { This must be done For every String } MakeBMHTable(StrToFind); { Enable reading Of read - only files } OldFileMode := system.filemode; system.filemode := 0; { Turn off critical windows handling } olderrorcode := SetErrorMode(SEM_FAILCRITICALERRORS); try { To search File For String } Result := FindString (StrToFind, FileName) finally { clean - up } DestroyBuffer; system.filemode := OldFileMode; SetErrorMode(OldErrorCode) End { searching } End; End. { bmh } ..... 07.08.2014 - Breakpoint - точка остановки 07.08.2014 - Топ-10 самых популярных приложений в Facebook 18.11.2013 - Панель управления и персонализация системы. Windows Vista 18.11.2013 - Логические и физические диски. Windows Vista 18.11.2013 - Удаление файлов и папок 01.11.2013 - Программирование дисковых подсистем: возможности 31.10.2013 - Windows8 или Windows7? 31.10.2013 - Windows8, игры |
||
Non-commercial fansite
|