I've spent a fairly big chunk of my professional career building public websites for a number of clients. One of the things that comes up again and again is search engine optimisation.
Surprisingly there is very little coverage of this is in business and marketing journals. Patrick Reid wanted to start raise awareness, and needed my technical help in preparing a paper. We finished it up earlier this year and it was published in the Journal of Strategic Management Education.
Corporate Communications in the FTSE 100: Evidence using Search Engines
JSME Vol 7: Issue 1, 2011
Patrick Reid (Greenwich School of Management, UK), Brian Lyttle (LiquidHub, Inc., USA)
Search engines have become an essential tool for information seekers on the world wide web. This makes it critical that company websites are optimised to achieve the highest possible ranking across the major search engines, in particular Google, ensuring maximum exposure and subsequent awareness. This paper identifies key factors in search engine optimisation and then analyses a sample of the largest 10 firms listed on the London Stock Exchange.
The findings reveal a lack of adoption of best practices which will impact on their search rankings. The paper also discusses tradeoffs inherent in search engine optimisation and draws out key implications for managers and academics.
The full paper is available from the Senate Hall site.
Following the recent PhillyPUG meetup I was trying to install scrapy on an old MacBook Pro running OS X 10.6 (Snow Leopard) and ran into a number of problems with the lxml dependency. This is the parser used to extract data from pages that scrapy downloads so you are not going to get very far without it.
It seems that the compilation problems experienced when installing with pip result from an attempt to build a universal binary. If you have Xcode 4 installed then you lose some of this capability and need to make sure that the correct architecture is specified.
Architecture Fix
Setting the architecture is something you can do in your bash profile, executing it under a new bash ensures that the build script picks it up.
sudo bash
export ARCHFLAGS='-arch i386 -arch x86_64'
pip install lxml # test it
pip install scrapy --upgrade # fix the failed scrapy install
Original Error
brianly$ sudo pip install lxml --upgrade
Downloading/unpacking lxml
Downloading lxml-2.3.tar.gz (3.2Mb): 3.2Mb downloaded
Running setup.py egg_info for package lxml
Building lxml version 2.3.
Building without Cython.
Using build configuration of libxslt 1.1.24
warning: no previously-included files found matching '*.py'
Installing collected packages: lxml
Found existing installation: lxml 2.2.2
Uninstalling lxml:
Successfully uninstalled lxml
Running setup.py install for lxml
Building lxml version 2.3.
Building without Cython.
Using build configuration of libxslt 1.1.24
building 'lxml.etree' extension
gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -I/usr/include/libxml2 -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.6-universal-2.6/src/lxml/lxml.etree.o -w -flat_namespace
/usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed
Installed assemblers are:
/usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64
/usr/bin/../libexec/gcc/darwin/i386/as for architecture i386
src/lxml/lxml.etree.c:161594: fatal error: error writing to -: Broken pipe
compilation terminated.
lipo: can't open input file: /var/tmp//ccYr9GpX.out (No such file or directory)
error: command 'gcc-4.2' failed with exit status 1
Complete output from command /usr/bin/python -c "import setuptools;__file__='/Users/brianly/dev/github/pyconscrape/build/lxml/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-axeEA7-record/install-record.txt:
Building lxml version 2.3.
Building without Cython.
Using build configuration of libxslt 1.1.24
running install
running build
running build_py
creating build
creating build/lib.macosx-10.6-universal-2.6
creating build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/__init__.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/_elementpath.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/builder.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/cssselect.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/doctestcompare.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/ElementInclude.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/pyclasslookup.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/sax.py -> build/lib.macosx-10.6-universal-2.6/lxml
copying src/lxml/usedoctest.py -> build/lib.macosx-10.6-universal-2.6/lxml
creating build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/__init__.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/_dictmixin.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/_diffcommand.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/_html5builder.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/_setmixin.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/builder.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/clean.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/defs.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/diff.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/ElementSoup.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/formfill.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/html5parser.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/soupparser.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
copying src/lxml/html/usedoctest.py -> build/lib.macosx-10.6-universal-2.6/lxml/html
creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron
copying src/lxml/isoschematron/__init__.py -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron
creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources
creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/rng
copying src/lxml/isoschematron/resources/rng/iso-schematron.rng -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/rng
creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl
copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl
copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl
creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
running build_ext
building 'lxml.etree' extension
creating build/temp.macosx-10.6-universal-2.6
creating build/temp.macosx-10.6-universal-2.6/src
creating build/temp.macosx-10.6-universal-2.6/src/lxml
gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -I/usr/include/libxml2 -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.6-universal-2.6/src/lxml/lxml.etree.o -w -flat_namespace
/usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed
Installed assemblers are:
/usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64
/usr/bin/../libexec/gcc/darwin/i386/as for architecture i386
src/lxml/lxml.etree.c:161594: fatal error: error writing to -: Broken pipe
compilation terminated.
lipo: can't open input file: /var/tmp//ccYr9GpX.out (No such file or directory)
error: command 'gcc-4.2' failed with exit status 1
----------------------------------------
Rolling back uninstall of lxml
Exception:
Traceback (most recent call last):
File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/basecommand.py", line 126, in main
self.run(options, args)
File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/commands/install.py", line 228, in run
requirement_set.install(install_options, global_options)
File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/req.py", line 1104, in install
requirement.rollback_uninstall()
File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/req.py", line 487, in rollback_uninstall
self.uninstalled.rollback()
File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/req.py", line 1417, in rollback
pth.rollback()
AttributeError: 'str' object has no attribute 'rollback'
Storing complete log in /Users/brianly/.pip/pip.log