In SharePoint Portal Server 2003 (and WSS 2.0) it was a simple task to discover the ghosted status of pages in your SharePoint environment. Simply run "select * from docs where content is not null and leafname like '*.aspx' and listid is null". And to reghost them, all you have to do is set the content to null again.
In Microsoft Office SharePoint Server 2007 (and WSS 3.0), things have become substantially more complicated. In order to alleviate the performance impact of customizing (unghosting) pages, and to alleviate the design burden of modifying SharePoint's look and feel, Microsoft uses Master Pages and Page Layouts to handle the design of a single page in SharePoint. Unfortunately, this means that the content field is almost never null for a page in the database. So how do you figure out if a page is customized?
There are three ways, I will start with the worst way, and work my way up:
1. The SPFile object has a property called CustomizedPageStatus, which maps to an enumeration. The values of the enumeration are Customized, Uncustomized and None. (I really wonder what "None" means in this context. It seems to me customized and uncustomized are mutually exclusive, as well as comprehensive.) In the case of a Publishing Page object, i.e. an ASPX page in a "Pages" library with the publishing feature activated, this enumeration always == Customized. I check it like this:
foreach (SPListItem item in List.Items)
{
if (PublishingPage.IsPublishingPage(item))
{
NumberOfPages++;
PublishingPage pPage = PublishingPage.GetPublishingPage(item);
if (pPage.ListItem.File.CustomizedPageStatus == SPCustomizedPageStatus.Customized)
{
Console.WriteLine(pPage.Url + " is customized.");
}
}
}
2. The second way is to use PublishingPage.IsDisconnected. This method works much better than the first I mentioned, except in one (admittedly extreme) circumstance. This circumstance is when you have a publishing page object which does not have
any assigned page layout. (Even customized (unghosted) pages have a page layout to fall back on.) The only time I have seen this circumstance is when migrating a SharePoint Portal Server 2003 portal, with a custom site definition for areas, and having a "Page Template Upgrade Definition" file fail. The upgrade will still work (no errors, no warnings), but the pages created have no page layout and are "broken".
3. The last method, by far the most reliable it to use SharePoint designer. SharePoint Designer seems to infallibly detect the customization status for any page. The trouble here of course, is that SharePoint Designer can not be put into a script like the first two methods, there fore limiting optimization.