Old but interesting news

I've spent a fairly big chunk of my pro­fes­sion­al career building public websites for a number of clients. One of the things that comes up again and again is search engine op­ti­mi­sa­tion.

Sur­pris­ing­ly there is very little coverage of this is in business and marketing journals. Patrick Reid wanted to start raise awareness, and needed my technical help in preparing a paper. We finished it up earlier this year and it was published in the Journal of Strategic Management Education.

Corporate Com­mu­ni­ca­tions in the FTSE 100: Evidence using Search Engines
JSME Vol 7: Issue 1, 2011
Patrick Reid (Greenwich School of Management, UK), Brian Lyttle (LiquidHub, Inc., USA)

Search engines have become an essential tool for in­for­ma­tion seekers on the world wide web. This makes it critical that company websites are optimised to achieve the highest possible ranking across the major search engines, in particular Google, ensuring maximum exposure and subsequent awareness. This paper identifies key factors in search engine op­ti­mi­sa­tion and then analyses a sample of the largest 10 firms listed on the London Stock Exchange.

The findings reveal a lack of adoption of best practices which will impact on their search rankings. The paper also discusses tradeoffs inherent in search engine op­ti­mi­sa­tion and draws out key im­pli­ca­tions for managers and academics.

The full paper is available from the Senate Hall site.

Tagged with marketing, search, search-engine-optimization, sem and seo.

Scrapy dependency problems with lxml

Following the recent PhillyPUG meetup I was trying to install scrapy on an old MacBook Pro running OS X 10.6 (Snow Leopard) and ran into a number of problems with the lxml dependency. This is the parser used to extract data from pages that scrapy downloads so you are not going to get very far without it.

It seems that the com­pi­la­tion problems ex­pe­ri­enced when installing with pip result from an attempt to build a universal binary. If you have Xcode 4 installed then you lose some of this capability and need to make sure that the correct ar­chi­tec­ture is specified.

Ar­chi­tec­ture Fix

Setting the ar­chi­tec­ture is something you can do in your bash profile, executing it under a new bash ensures that the build script picks it up.

sudo bash
export ARCHFLAGS='-arch i386 -arch x86_64'
pip install lxml # test it
pip install scrapy --upgrade # fix the failed scrapy install

Original Error

brianly$ sudo pip install lxml --upgrade
Downloading/unpacking lxml
Downloading lxml-2.3.tar.gz (3.2Mb): 3.2Mb downloaded
Running setup.py egg_info for package lxml
  Building lxml version 2.3.
  Building without Cython.
  Using build configuration of libxslt 1.1.24
  warning: no previously-included files found matching '*.py'
Installing collected packages: lxml
Found existing installation: lxml 2.2.2
  Uninstalling lxml:
    Successfully uninstalled lxml
Running setup.py install for lxml
  Building lxml version 2.3.
  Building without Cython.
  Using build configuration of libxslt 1.1.24
  building 'lxml.etree' extension
  gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -I/usr/include/libxml2 -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.6-universal-2.6/src/lxml/lxml.etree.o -w -flat_namespace
  /usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed
  Installed assemblers are:
  /usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64
  /usr/bin/../libexec/gcc/darwin/i386/as for architecture i386
  src/lxml/lxml.etree.c:161594: fatal error: error writing to -: Broken pipe
  compilation terminated.
  lipo: can't open input file: /var/tmp//ccYr9GpX.out (No such file or directory)
  error: command 'gcc-4.2' failed with exit status 1
  Complete output from command /usr/bin/python -c "import setuptools;__file__='/Users/brianly/dev/github/pyconscrape/build/lxml/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-axeEA7-record/install-record.txt:
  Building lxml version 2.3.

Building without Cython.

Using build configuration of libxslt 1.1.24

running install

running build

running build_py

creating build

creating build/lib.macosx-10.6-universal-2.6

creating build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/__init__.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/_elementpath.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/builder.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/cssselect.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/doctestcompare.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/ElementInclude.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/pyclasslookup.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/sax.py -> build/lib.macosx-10.6-universal-2.6/lxml

copying src/lxml/usedoctest.py -> build/lib.macosx-10.6-universal-2.6/lxml

creating build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/__init__.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/_dictmixin.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/_diffcommand.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/_html5builder.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/_setmixin.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/builder.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/clean.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/defs.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/diff.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/ElementSoup.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/formfill.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/html5parser.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/soupparser.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

copying src/lxml/html/usedoctest.py -> build/lib.macosx-10.6-universal-2.6/lxml/html

creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron

copying src/lxml/isoschematron/__init__.py -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron

creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources

creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/rng

copying src/lxml/isoschematron/resources/rng/iso-schematron.rng -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/rng

creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl

copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl

copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl

creating build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt -> build/lib.macosx-10.6-universal-2.6/lxml/isoschematron/resources/xsl/iso-schematron-xslt1

running build_ext

building 'lxml.etree' extension

creating build/temp.macosx-10.6-universal-2.6

creating build/temp.macosx-10.6-universal-2.6/src

creating build/temp.macosx-10.6-universal-2.6/src/lxml

gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -I/usr/include/libxml2 -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.6-universal-2.6/src/lxml/lxml.etree.o -w -flat_namespace

/usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed

Installed assemblers are:

/usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64

/usr/bin/../libexec/gcc/darwin/i386/as for architecture i386

src/lxml/lxml.etree.c:161594: fatal error: error writing to -: Broken pipe

compilation terminated.

lipo: can't open input file: /var/tmp//ccYr9GpX.out (No such file or directory)

error: command 'gcc-4.2' failed with exit status 1

----------------------------------------
  Rolling back uninstall of lxml
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/basecommand.py", line 126, in main
    self.run(options, args)
  File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/commands/install.py", line 228, in run
    requirement_set.install(install_options, global_options)
  File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/req.py", line 1104, in install
    requirement.rollback_uninstall()
  File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/req.py", line 487, in rollback_uninstall
    self.uninstalled.rollback()
  File "/Library/Python/2.6/site-packages/pip-1.0.1-py2.6.egg/pip/req.py", line 1417, in rollback
    pth.rollback()
AttributeError: 'str' object has no attribute 'rollback'

Storing complete log in /Users/brianly/.pip/pip.log

Tagged with lxml, python, scrapy and scripting.