Results 1 to 34 of 34

Thread: Sitemap.xml problems

  1. #1

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16

    Sitemap.xml problems

    From Google Webmaster Tools:

    Unsupported file format
    Your Sitemap does not appear to be in a supported format

    ---------------------------------------------

    From Search engine results:

    <<link removed>>
    The XML page cannot be displayed
    Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.

    Invalid at the top level of the document. Error processing resource 'http://www.topix4u.com/sitemap.xml'. Line 1, Position...

    <?xml version="1.0" encoding="UTF-8" ?>

    ----------------------------------------------

    Sitemap (brief version)

    <?xml version="1.0" encoding="UTF-8" ?>
    - <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    - <!-- created with Free Online Sitemap Generator www.xml-sitemaps.com
    -->
    - <url>
    <loc>http://www.topix4u.com/</loc>
    <priority>1.00</priority>
    <lastmod>2009-03-08T21:55:35+00:00</lastmod>
    <changefreq>daily</changefreq>
    </url>
    - <url>
    <loc>http://www.topix4u.com/index.html</loc>
    <priority>0.80</priority>
    <lastmod>2009-03-08T21:55:35+00:00</lastmod>
    <changefreq>daily</changefreq>
    </url>
    - <url>
    <loc>http://www.topix4u.com/bird.html</loc>
    <priority>0.80</priority>
    <lastmod>2009-03-08T21:55:06+00:00</lastmod>
    <changefreq>daily</changefreq>
    </url>
    - <url>
    <loc>http://www.topix4u.com/privacy.html</loc>
    <priority>0.80</priority>
    <lastmod>2009-03-08T21:08:26+00:00</lastmod>
    <changefreq>daily</changefreq>
    </url>
    </urlset>

    -------------------------------------

    After someone can advise on what's what, I have another strange issue. For years I've used GsiteCrawler and had no problems submitting a sitemap.xml to G Webmaster Tools. But with this new site, the software misses several of my pages! So, that's why I am using an online popular sitemap generator. All pages are listed, but Google doesn't like the format.

    Also, is the space between 8" ?> OK or not: ="UTF-8" ?>
    I've submitted without the space: ="UTF-8"?> and still having problems at GWT.

    TIA! Any help appreciated very much.
    Last edited by Doc C; 4-7-09 at 09:32 PM. Reason: Edited per request from OP

  2. #2
    Webmaster, DJ, Producer
    Join Date
    Sep 2006
    Location
    UK
    Posts
    495
    Rep Power
    12
    The problem may be that you have spaces before <?xml

    I tested this out on my own sitemap by putting 2 spaces before <?xml and it gave me the same error. In fact, it actually tells you in the error by putting "--^" below the line. each - is a space that has been made before <?xml

    You should paste the sitemap generated at xml-sitemaps as is by using "select all" in your sitemap file and then paste.

  3. #3
    Webmaster, DJ, Producer
    Join Date
    Sep 2006
    Location
    UK
    Posts
    495
    Rep Power
    12
    btw i use xml-sitemaps.com for building my sitemap also, and ive never had a problem with the code they generate.

  4. #4

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    Thank you for your reply! I'm going to try again without the space. It's so odd, because I never had problems before with any other website. I'll post the results.

  5. #5
    Custom User Title entrecon's Avatar
    Join Date
    Aug 2006
    Location
    Michigan
    Posts
    2,742
    Rep Power
    17
    I was wondering if there were any odd file names on your site that might be breaking it. I looked at your site real quick, and did notice that a couple of the file names do have two underlines in a row, but can't believe that would do it. It is possible that the files that GsiteCrawler is missing are the ones taht are causing the problem.
    ________________________________
    Find me on twitter: @entrecon

  6. #6

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    Thanks entrecon. Would you please clarify where you saw 2 underlines in a row on filenames? I'm not clear what you mean. View source? On my home page? Sorry, I'm a little confused.

  7. #7
    Custom User Title entrecon's Avatar
    Join Date
    Aug 2006
    Location
    Michigan
    Posts
    2,742
    Rep Power
    17
    I saw this one here: http://www.topix4u.com/animal__iq.html

    I don't know if it is a problem or not, I just clicked on 2 or three links to see the naming structure and noticed the larger gap.
    ________________________________
    Find me on twitter: @entrecon

  8. #8

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    Yes, I found the 2 underlines in a row, and fixed it before you posted!

    Well, I'm happy to report, this sitemap.xml was accepted by GWT.

    (edited for brevity)

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    <!--#Generated by SOFTplus GSiteCrawler v1.23 rev. 286 by SOFTplus Entwicklungen GmbH, http://gsitecrawler.com/, http://johannesmueller.com/gs/ -->
    <url><loc>http://www.topix4u.com/</loc><lastmod>2009-03-03T21:04:43+00:00</lastmod><changefreq>daily</changefreq><priority>1.00</priority></url>
    <url><loc>http://www.topix4u.com/index.html</loc><lastmod>2009-03-03T21:04:43+00:00</lastmod><changefreq>daily</changefreq><priority>0.50</priority></url>
    <url><loc>http://www.topix4u.com/bird.html</loc><lastmod>2009-03-03T21:04:17+00:00</lastmod><changefreq>daily</changefreq><priority>0.50</priority></url>
    <url><loc>http://www.topix4u.com/images/cat_spray.jpg</loc><lastmod>2009-03-03T21:06:34+00:00</lastmod><changefreq>daily</changefreq><priority>0.50</priority></url>
    <url><loc>http://www.topix4u.com/images/paypal.jpg</loc><lastmod>2009-03-03T21:06:48+00:00</lastmod><changefreq>daily</changefreq><priority>0.50</priority></url>
    <url><loc>http://www.topix4u.com/images/anal_gland.jpg</loc><lastmod>2009-03-03T21:06:56+00:00</lastmod><changefreq>daily</changefreq><priority>0.50</priority></url>
    </urlset>

    So, now I'm going to run GsiteCrawler again and see if it picks up all the files after fixing the extra underline.

  9. #9

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    P.S. the sitemap.xml posted above is from GSiteCrawler and even though it was accepted, it didn't list all my files.

    I have resubmitted using the online generator with all my files and made sure there is no space here: ="UTF-8"?>

    When viewed online using IE there is a space though. Shaking my head.

  10. #10
    Custom User Title entrecon's Avatar
    Join Date
    Aug 2006
    Location
    Michigan
    Posts
    2,742
    Rep Power
    17
    Have you looked at the files that GSiteCrawler isn't picking up to see if there is a reason?
    ________________________________
    Find me on twitter: @entrecon

  11. #11

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    Hi entrecon,

    Yes, I've examined the files and see no rhyme or reason. I've compared the results by printing out both xml-sitemaps.com vs. GiteCrawler and there are 9 files missing from GsiteCrawler.

  12. #12
    Custom User Title entrecon's Avatar
    Join Date
    Aug 2006
    Location
    Michigan
    Posts
    2,742
    Rep Power
    17
    Have you tried posting your question over in the Google Group for GSiteCrawler? Just like folks here are great at answering PowWeb questions, they would be your better source to ask questions on GSiteCrawler.
    ________________________________
    Find me on twitter: @entrecon

  13. #13

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    No, I haven't posted at Google Groups/GsiteCrawler yet. I did look through all the messages though.

    Here is the section of the 9 files not showing up at GSiteCrawler:

    <p align="left">
    <b>Animals General Information</b><br>
    <a href="animal_careers.html">Animal Careers</a> -
    <a href="animal_iq.html">Animal Intelligence IQ</a> -
    <a href="animal_news.html">Animal News</a> -
    <a href="animal_sounds.html">Animal Sounds</a> -
    <a href="animal_terms.html">Animal Terms</a> -
    <a href="animal_training.html">Animal Training</a> -
    <a href="animal_kids.html">Animals and Kids</a> -
    <a href="clicker_training.html">Clicker Training</a> -
    <a href="pet_insurance.html">Pet Insurance</a> -

  14. #14
    Webmaster, DJ, Producer
    Join Date
    Sep 2006
    Location
    UK
    Posts
    495
    Rep Power
    12
    From an SEO point of view, never have spaces or underscores in your filenames. Search engines dont like them. I use - or +

  15. #15
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    From an SEO point of view, never have spaces or underscores in your filenames. Search engines dont like them. I use - or +
    Though Matt Cutt's mentioned a while back that dashes are better than underscores, I can tell you that we NEVER had a problem getting listed with underscores in file names. We don't make file names too long, so separating two or three words in a file name is okay. I have used either method and see no difference. A space should not be used in filenames.

    Query URLs even get listed, but sometimes they are too long or not different enough. If you have ever seen those crappy Nuke websites where the contents are in modules and all of the URLs look nearly alike, you could see why a search engine thinks every link is from the home page.

    Here is the section of the 9 files not showing up
    Try placing those URLs in an HTML page and link to it from your home page. Include those URLs and the URL of the page with the new links in your sitemap.xml file and submit it. If you have a sitemap web page, include the URLs there.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  16. #16
    Webmaster, DJ, Producer
    Join Date
    Sep 2006
    Location
    UK
    Posts
    495
    Rep Power
    12
    yes, they get indexed Yvette (i never said they didnt) , but they dont do well with keywords being searched for in some search engines.

  17. #17

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    Thank you everyone for your help. I am using xml-sitemaps.com all the time now.

    And, all filenames have underscores replaced with hypens, including the one's I posted above.


    <b>Animals General Information</b><br>
    <a href="animal-careers.html">Animal Careers</a> -
    <a href="animal-iq.html">Animal Intelligence IQ</a> -
    <a href="animal-news.html">Animal News</a> -
    <a href="animal-sounds.html">Animal Sounds</a> -
    <a href="animal-terms.html">Animal Terms</a> -
    <a href="animal-training.html">Animal Training</a> -
    <a href="animal-kids.html">Animals and Kids</a> -
    <a href="clicker-training.html">Clicker Training</a> -
    <a href="pet-insurance.html">Pet Insurance</a> -

  18. #18
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    all filenames have underscores replaced with hypens
    That wasn't necessary. But since you changed the file names, you should resubmit your sitemap. And if some of those files were already indexed, you should use .htaccess redirect from the old file name to the new file name.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  19. #19
    Webmaster, DJ, Producer
    Join Date
    Sep 2006
    Location
    UK
    Posts
    495
    Rep Power
    12
    quote from by Peter Kent in Search engine Optimization for Dummies third edition (which i really suggest anybody to buy, it pays for itself) "You can seperate keywords in a name with dashes, but not with underscores, despite what you webmaster may tell you".

    It may be that google doesnt have a problem with it anymore since the book was published last year, but Google isnt the only search engine and you should optimize your website for all search engines to maximise your exposure.

  20. #20
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    I've been doing SEO for years and know for a fact that underscores have been accepted by many search engines. SEO: Hyphen Or Underscore? and Hyphens & Underscores Are Now Treated Equally in Google.comwere posted in 2007. I have used both dashes and underscores and see no difference for indexing or listings.

    You can play it safe and use hyphens, but I wouldn't tell people to rename pages already indexed just because they used underscore instead. As for hyphens in domain names, I have seen a few directories that didn't want to see more than two or three in the domain name, but I don't know their actually reasons for the rule. It could be their input validation rule.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  21. #21
    Custom User Title entrecon's Avatar
    Join Date
    Aug 2006
    Location
    Michigan
    Posts
    2,742
    Rep Power
    17
    My reason for the original re-naming was to try and diagnose why the one software package was skipping the pages when generating the XML file and to determine why the XML file that did include these files was rejected by Google. I noted that initially, at least one of the files had 2 underscores in a row and may or may not have an impact. Strictly following my standard trouble shooting method, find what is different between what works and what does not work, change it, and see if it has an impact on the outcome.
    ________________________________
    Find me on twitter: @entrecon

  22. #22
    Webmaster, DJ, Producer
    Join Date
    Sep 2006
    Location
    UK
    Posts
    495
    Rep Power
    12
    tbh, i would rather take what an SEO Expert has said in a book thats based entirely on the subject of SEO than some blog post on the net, regardless of the source. You might be right Yvette, but, I dont trust internet sources that i dont know. The thing is most of them are saying its the top search engines that are ok with it, but what about the smaller ones?

  23. #23
    Custom User Title entrecon's Avatar
    Join Date
    Aug 2006
    Location
    Michigan
    Posts
    2,742
    Rep Power
    17
    It is amazing what a little bit of ink and paper will do for someone's credibility.

    Just because someone has managed to convince a publisher that they are an expert doesn't make it true. It seems to me that not too long ago someone convinced a publisher that his account of his time in a concentration camp was true. Turns out, he made up parts of the story.

    I am not saying that the SEO bok is true, or that the blog post is true. I am just saying that the assumption that something in print carries more weight than something posted online is not always the case. I have seen many instances where published works are based on theory and not on practice.

    If you want to convince me that one way is better than another that provide me with concrete examples of where one did work and the other did not. On similar sites create pages that use either an underline or a dash (or any other methods proposed to be better) and then show over a period of time how these pages fair with the search engines.
    ________________________________
    Find me on twitter: @entrecon

  24. #24
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    I can't post URLs for my clients here for specific pages, but I know from experience that underscores have been okay for years. In fact, my own website articles have underscores and brought a lot of traffic to my website. For example, "How to Remove ISP Branding from Windows IE" can be typed (without quotes) in Google and my article (written in 2002) ranks #4. Try Yahoo where it ranks #1. Try Windows Live (MSN) where it ranks #2. This is one of many examples.

    entrecon, I agree with you!
    Last edited by YvetteKuhns; 3-11-09 at 12:25 PM. Reason: added Windows Live ranking
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  25. #25

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    I submit a new sitemap after a few new pages are published. I also find using a hypen(just one) leaves less room for error. I'm happy with xml-sitemaps.com. That program will work nicely until my site grows to over 500 pages.

  26. #26
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    I submit a new sitemap after a few new pages are published.
    I do the same. Then I check Google Webmaster Tools to see if the sitemap.xml file was accepted and also check for any missing URLs.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  27. #27

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    Yvette, I do almost the same routine, but I first check to see if any url's are missing before publishing, then I submit to WMT and wait a while before checking to see if it's been accepted. I have never had a problem with it being accepted (since the sitemap submission was established) so that's why I posted here.

    So thank you all for contributing and helping. I learned to not use Gsite Crawler, and to use dashes instead of underscores. I even changed a few image filenames that had underscores. This is a great community!

  28. #28
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    I first check to see if any url's are missing before publishing
    I was talking about checking to see if Google reports anything missing. Sometimes a server problem (like the one we are having today) can interfere with Google indexing pages. It is especially annoying when submitting a sitemap.

    and to use dashes instead of underscores. I even changed a few image filenames that had underscores.
    I can't believe you bothered to change the names when it makes no difference (as I have proved in my own examples). If those files were indexed, you had to submit the new URLs to search engines to get them found again. You should redirect old URLs to the new ones. If they weren't indexed, no worries.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  29. #29

    Join Date
    Jan 2002
    Location
    Phoenix, Arizona US
    Posts
    289
    Rep Power
    16
    Those files were NOT indexed.

  30. #30
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    Good.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  31. #31
    Webmaster, DJ, Producer
    Join Date
    Sep 2006
    Location
    UK
    Posts
    495
    Rep Power
    12
    We will just have to agree to dissagree Yvette, all im saying now its better to cater for all search engines rather than just the top ones that have moved with the times. That way you can maximize your exposure. for example i am getting hits from search engines ive never even heard of.

    the only way (which isnt all that great) to check is to search for the terms "cat people" and "cat-people" and "cat_people" or some other phrase you wish, lol, and see what results you get. Google and yahoo seem to be fine but show different results. but, wot about the smaller engines, do they treat them as two seperate keywords or a keyphrase. As people may search using the first two instead of the underscore one. Not only that its saves time for people who are typing in the URL, you have to press the shift key for the underscore.

    But, what the hey. When everyone understands search engines (which is never gonna happen me thinks) maybe this issue will be set aside for a more imposible mission, lol.

  32. #32
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    I just didn't want people to deliberately change their file names and mess up their current results based on outdated information. My articles have done well for years, even on minor search engines. The contents are more important than the file names. I do remember several years ago when search engines read those words as one word, but I haven't seen that in years. To play it safe, people can still use hyphens. I use either one, it depends on what exists on a website, just for consistency.

    As for typing, people don't type underscores, they type spaces in a search. I never really thought about not having to press the Shift key when naming files. I simply thought of consistency. I normally use hyphens anyway, but sometimes I forget and use the underscore after working on another project where it was used. It really doesn't matter.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

  33. #33
    Custom User Title entrecon's Avatar
    Join Date
    Aug 2006
    Location
    Michigan
    Posts
    2,742
    Rep Power
    17
    Quote Originally Posted by bdw View Post
    quote from by Peter Kent in Search engine Optimization for Dummies third edition (which i really suggest anybody to buy, it pays for itself) "You can seperate keywords in a name with dashes, but not with underscores, despite what you webmaster may tell you".

    It may be that google doesnt have a problem with it anymore since the book was published last year, but Google isnt the only search engine and you should optimize your website for all search engines to maximise your exposure.
    I have just read Kent's book, and while he does suggest that using a dash in a file name would be more beneficial based on some tests he has run, he admits that the file name in itself is not going to influence a search that much. There are several techniques that he mentions that are good practice and will help, but really won't impact the over all ability of your page to be indexed and found. If you are expecting your page to be found based on the file name alone, then yes this is a HUGE issue. However, since most people also plan on having content on their pages, this is something you could let slide. Kent also indicated that there was a rumor that Google was already moving to have the underscore treated the same as dashes.

    As far as the book being published last year, keep in mind that that the information had to be to the publisher even before that. I didn't visit all of the suggested resources that he provided, but of he ones I did check 3 or 4 of the sites no longe exist, in fact they are parked domains with ads on them.
    ________________________________
    Find me on twitter: @entrecon

  34. #34
    YvetteKuhns's Avatar
    Join Date
    Feb 2003
    Location
    Allentown, PA USA
    Posts
    15,244
    Rep Power
    34
    I have a new client whose website has been online since 1999. He said the pages that use underscore between words got the best ranking! He tried hyphens, underscores or nothing between words. Separating words is what helps.
    Yvette Kuhns
    Power Pages Web Design
    Customized Internet Advertising Solutions

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •