| Steve: Developing on the Edge Thoughts on development, Web-services, technology and mountains. | |
9Dec Tue2008 | The mediocre firewall of the UK meets spinal tap
I am now fortunate in that I can now browse to a wikipedia entry
on a 1970s
rock band album. The kind of album and album cover that Spinal
Tap made fun of. And what a gift it has proven to be, one that just
keeps giving. Now so many more people round the UK have heard of
the band, seen the cover and are aware that all the consumer ISPs
are filtering all your HTTP requests through a transparent
proxy
Such is progress. The China firewall works. They've got all
suppliers to the state, from the apps -skype- to the network
switches to conspire to block forbidden keywords and sites. Here
our ISPs can't even censor 1970s rock band artwork reliably.
|
| |
Posted by steve at
20:11comments
[
0
]
trackbacks
[
0
]
| 9Dec Tue2008 | limitations of a file-driven CM world view
I am upgrading my desktop from Ubuntu 7.10 to 8.10 by way of
8.04. It was not intentional, but I made the mistake of upgrading
to SmartSVN 5 last week, which silently updated my SVN databases to
a version that is no longer compatible with the 7.10 command-line
tools. A bulk upgrade seems like the best way forward. So, I have
to use the laptop while the desktop churns away. With the Janet
network two router hops away the download is fast (1 MB/s), so
within 10 minutes of starting the update process I am presented
with the first "a configuration file you never knew existed and
whose syntax is alien to you has changed, would you like to see it"
dialog.
This shows a flaw in .deb and .rpm systems. Their world view is
everything is a file, files belong to specific packages. But
really, it is file state changes that belong to packages. If
I install samba, and it creates a user samba in /etc/passwd, it is
deploying a state change to that file "insert a new line"; only
that line should be managed as part of the package. Furthermore,
changes to the system that don't invalidate that change "there is a
samba user in /etc/password" are valid, with no need to bother
me.
This doesn't mean I don't think the Linux package managers
aren't wonderful -they are- but that linux itself doesnt have a
world view that is 100% compatible. Unless the package managers
move beyond file ownership, the various parts of the system need to
support aggregatable files, so that the standard file
create/overwrite/delete operations that RPM/deb installers do can
be used to manipulate the state of the machine without creating
conflicts over who owned or edited a configuration file
|
| |
Posted by steve at
16:17comments
[
2
]
trackbacks
[
0
]
| 1Dec Mon2008 | Presentation: My other computer is a datacentre
As other ApacheCon attendees will know, the title of this new
presentation,
My other computer is a datacentre is based on Fitz's Google Code slogan,
only spelt correctly for en-gb. Brian actually provided the 40+
stickers needed to hand out one to every student in the CS course
who got to see the lecture, and Tom White graciously brought them
back from ApacheCon US to Wales, from whence The Royal Mail got
them to my house, and then by hand to the university. In
compensation, Fitz gets a slide of his own. As do Yahoo!'s Hadoop
team, and our household's deployment project, who got to see a
datacentre when he visited my site. The cold air coming up from the
floor was right up there with the blinking lights as the key
features of the room.
This is a talk on the engineering aspects of datacentres,
looking at some of the implications they bring to software that
runs on them. It's a the sequel to
Farms, Fabrics and Clouds, which listed what assumptions were
no longer valid, without exploring the implications.
What you don't see in the presentation is the bits where I go
off on a ramble, mainly on Power. The photo of the sunset is from a
motel in Hood River, Oregon, looking at the Columbia, which is near
where all the PNW datacentres are based: all that water running
through dams. Google's Dalles facility is a few miles upriver
behind the camera, further again comes the MSN facility, amongst
others.
I gave some working MapReduce demos in this talk. Paolo has been
teasing me about not bothering to write my own MR code, focusing on
deploying Hadoop instead. This is my response. Six months worth of
scanned Bluetooth devices from my house turned into Erlang Records,
fed through a derivative of the MapReduce engine included in the
Programming Erlang book. My derivative not only applies the
mapper for every record in the source datafile(s) (doing each file
in parallel), it correctly terminates when there is a programming
error in the mapper, forwarding the error to the shell. I have
found that useful. I can now show the breakdown of devices seen by
day, by hour, which devices get seen the most times or for the
longest duration, etc, all in a few lines of functional programming
code. Nice. This doesn't mean I am an unequivocal fan of Erlang,
only that it has some features that I appreciate. Like native list
and tuple support, and dynamic function creation. I'd need more
time with its processes before I can conclude whether that is good
or bad. "Interesting", is all I will say there for now.
I will play more with Erlang/my Bluetooth data and maybe write
something up on it. For now,
the Bath Bluetooth Study is the closest published paper on
street-based Bluetooth monitoring of mobile devices.
|
| |
Posted by steve at
14:28comments
[
0
]
trackbacks
[
0
]
| 28Nov Fri2008 | Ubuntu 8.10 rollout complete
I've now pushed out Ubuntu 8.10 to two laptops, and clean
installed it on my home desktop. I've left the work desktop alone
as problems there translate into serious productivity problems, and
it's not worth the hassle.
On a laptop, the mobility and power management features make
this a really good mobile Linux. Its the first Linux that feels
mobile, rather than a Linux that fits on a laptop but doesn't like
to be moved. On a desktop, there isn't so much compelling need to
upgrade
- I can't get multi-monitor working except with a reboot while
the second monitor is plugged in. At least reboot is fast!
- No sound on either laptop. One is a known problem -the other,
the sound went away
- Gnome Network Manager is trouble. On the laptops it has quirks:
it wants to go offline on resumes, doesn't always select the
nearest wifi network (this could be card trouble, of course)
- On the static desktop, Network Manager keeps stamping on
resolv.conf, but doesn't remember its own DNS entries across
reboots. You need to kill Network manager.
- Network Manager/Ubunut does handle 3G wireless USB dongles
nicely: wizard driven up and running. Slick.
- For the desktop, wicd is a better
alternative.
- Once you run VMWare, cursor keys start playing up across the
whole X session
I initially thought the static DNS entries problem was due to me
doing an upgrade of a previous install, so reformatted the root
partition and installed the OS clean. After much grub work that
came up, but the problem remained -and at least two other people I
know have the same problem. This tells me something about the
Network Manager team: they use DHCP for everything.
This leads me to restate "Loughran's Law of Networks", which is
this:
Networked applications work
best in a network architecture which matches that of the
development team
Outlook and Windows explorer networking work if there is a fast,
high-availability link to the servers. Java works reliably if
reverse-DNS is always fast and accurate, and assumes that machines
never change IP Addresses during the life of an application. Most
open source apps have an awful time with web proxies, as its not
something they have encountered. And Network Manager, well, it may
work on the move, but not on static systems. Which is progress, of
a sort.
|
| |
Posted by steve at
12:02comments
[
0
]
trackbacks
[
0
]
| 26Nov Wed2008 | Farms, Fabrics and Clouds (slightly updated) I've updated our Farms,
Fabrics and Clouds talk, as given to the local university students
this week.
|
| |
Posted by steve at
00:01comments
[
0
]
trackbacks
[
0
]
| 25Nov Tue2008 | What should sun do?
Fascinating article by Tim Bray,
what should Sun do
+1 to the back-off-from-the-client story. It's over. JVMs of all
kind are commoditised. Unfortunately, Sun still have to spend lots
of money keeping the windows JVM alive and staying up with MS on
features, even though MS have the windows franchise to fund them.
As the only people who use Java client apps are us developers, why
not force everyone to move to unix? And make Java integrate with
Unix way, way better, by not seeking a lowest common denominator of
platforms.
Server side, the future is apps running out outsourced
datacentres. Some of the power budget options of their CPUs are
interesting, and I'm sure Hadoop would run well on their clusters.
Rather than worry about Glassfish-the-open source alternative to
websphere, they should be thinking about
glassfish-the-pay-per-CPU-hour version, running on Sun kit with
Hadoop running on an HA filesystem behind it. If the power story is
good, this could work.
One problem here is the development GUI in Java land is not
netbeans. It's Eclipse. Whether you like it or not
|
| |
Posted by steve at
23:57comments
[
0
]
trackbacks
[
0
]
| 23Nov Sun2008 | Visions of a future
Some lovely articles in the NYTimes this week, if you can get
past their cookie policy (hint: delete all their cookies). First,
one on
the NetFlix competition. There are some interesting questions
about human-nature itself lurking here. I was at a university
lecture on the state of the AI community last week, where the
lecturer observed that we do have the equivalent of "skynet" up and
running -the large computer systems modelling human behaviour- but
all they were doing was a statistical hack kind of AI to recommend
other goods and place adverts better. The dreams of the AI
community from 50 years ago for self aware machines isn't there,
and its not clear that tuning the current algorithms is enough. Of
course, the NYT doesn't get into looking at the whole AI research
agenda, merely cinema recommendations. What interests me is whether
there is correlations across products: can you predice what videos
people will like from their book purchases, what they do in their
spare time, everything. Can you model people using statistics
alone? I also wonder where else such recommendation systems could
be extended to -things in the real world?
The other article is on
the ubiquity of screens.. It's something to make you think.
Yes, I really do spend most of my day staring at screens. Work:
LCD. Home, TV + laptop, or both, as here I have to sit in the room
while the sprog watches the original Star Wars for the third time
this weekend and I try not to get too bored. What the article does
discuss is how online video has evolved to be shorts 2-10 minute
videos made by the community itself. Again, the sprog shows the way
here: he doesn't watch kids TV. He'd rather have time on a laptop
looking for stop-motion mars-mission lego animations on youtube.
Why? Content that appeals to him. We have also discovered the
merits of online video in our local political campaigns. A video
whose URL you can post out is a fantastic way of getting your
message across. That's a very interesting form of democratisation
at work. Whereas before you had to get the interest of the local,
regional or national TV channels to get a message out, now you
don't. That's going to move power from the legacy media companies
to the new web hosted players, and to all the people who upload the
artifacts in the first place.
Anyway, some interesting reads. An online version of a printed
paper discussing how ubiquitous screens and community publishing is
going to change media. Hmmm. And this on the same day as an article
discussing whether irony is dead.
|
| |
Posted by steve at
18:02comments
[
0
]
trackbacks
[
0
]
| 22Nov Sat2008 | Ooh, a petabyte
Google give their
times to sort a Terabyte on GFS/MapReduce in one of their
datacentres: the record to beat is now 68s.
What is really impressive is they then went on to sort a full
petabyte in 6 hours. Which means one petabyte in, one petabyte out,
one petabyte for intermediate bits and all of this stuff replicated
3-ways: 6 petabytes all in. Spare. I guess this will become a new
way of stress testing a new datacentre: get it to sort for a
while.
There's a bit of commentary on "straggler management"; these can
be trouble in Hadoop -and in BitTorrent. The slow machines end up
as a bottleneck.
|
| |
Posted by steve at
21:39comments
[
0
]
trackbacks
[
0
]
|
  | |