Full-text search in SQL Azure
What is Full-text search in SQL?
Full-Text Search in SQL Server lets users and applications run full-text queries against character-based data in SQL Server tables. Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase. A full-text query returns any documents that contain at least one match (also known as a hit). A match occurs when a target document contains all the terms specified in the full-text query, and meets any other search conditions, such as the distance between the matching terms.
LIKE vs Full-text-Search
Full-text search can perform a linguistic search whereas SQL LIKE character patterns only. Also, like predicate cannot be used to query formatted binary data unless converted into the character format. A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.
For example, if you want to find words that are forms of “mouse” like mouse, mice, mouses then FT search can do that easily. Using LIKE you have to code all forms of the word & this will differ based on the language. Run the below query to see all forms of mice.
select * from sys.dm_fts_parser(‘formsof(freetext, mice)’, 1033, 0, 0)
The above query returns the following display terms. mice’s, mouse, mouse’s, moused, mouses, mousing,mice. That means by searching the word mice using full-text search returns all the documents that matches the above 7 words. With just like one need to issue 7 search queries
What should I do to enable full-text search in SQL Azure?
Absolutely nothing. SQL Azure configures the full-text search in Azure be default in all the databases irrespective of the tiers with no additional cost. The configuration parameters can’t be changed too.
What documents types are supported?
Full-text search in Azure supports 100+ document types and one can query the list of supported document types in Azure as:
select * from sys.fulltext_document_types where version != ”
Full-text search vs Azure search:
Microsoft has both full-text search with SQL Azure and Azure search service, which one should I use would be a very common question for the application developers. This section will briefly explain what works best in what scenarios.
If one needs both relational data and search at one location, SQL Azure full-text search is an obvious choice.
Scenario | Full-text search | Azure search service |
I have relational data and needs search | Consider | |
I want to offload search outside my OLTP | Consider | |
I don’t want to change the existing Application | Consider | |
I don’t want to spend more money on search service | Consider | |
I don’t want a database | Consider | |
I have more than 15 concurrent queries | Consider | |
I store my files in blob storage | Extra work needed | Not very different than storing in SQL db and indexing |
Geo replication | Supported | Supported |
Scale | Supported | Supported |
Example:
CREATE FULLTEXT CATALOG DocumentCatalog;
create table DocumentRepository(DocumentID int not null, DocTextData varbinary(max), DocType char(4))
— Create unique index on the document id
CREATE UNIQUE INDEX ui_ukDoc ON DocumentRepository(DocumentID);
CREATE FULLTEXT INDEX ON DocumentRepository
(
DocTextData –Full-text index column name
TYPE COLUMN DocType –Name of column that contains file type information
Language 2057 –2057 is the LCID for British English
)
KEY INDEX ui_ukDoc ON DocumentCatalog WITH CHANGE_TRACKING AUTO
insert into DocumentRepository values(1, convert(varbinary(max), ‘mouse is running’), ‘txt’)
insert into DocumentRepository values(2, convert(varbinary(max), ‘mice ran away’), ‘txt’)
insert into DocumentRepository values(3, convert(varbinary(max), ‘This is SQL full-text introduction’), ‘txt’)
SELECT * FROM DocumentRepository WHERE FREETEXT (DocTextData, ‘run’ );
SELECT * FROM DocumentRepository WHERE FREETEXT (DocTextData, ‘mice’ );
Limitations:
Not all document types are supported (most common like pdf, fox and docx aren’t supported).
No semantic search support
No ability to change configuration options
Unable to see crawl errors
How to index docx / pdf documents?
One can use IFilters to grab the text in the application and then store it as varbinary in SQL table as document type txt. One such example is here