Steve: Developing on the Edge
Steve: Developing on the Edge
Thoughts on development, Web-services, technology and mountains.
Page1234567891011121314151617181920
1 - 8 of 1167
18Jun
Thu2009
new SmartFrog release -3.17.012

There is an New SmartFrog Release out.

It's a couple of weeks since the last one, the main big changes are a component to do an unzip on deploy (someone wanted it), and the Hadoop code is up to trunk with my patches in. You can deploy clusters of Hadoop machines and submit work to them with this.

Where I've been going since then is making the Hadoop components better at working out the IP addresses they are listening on; given up on asking the hadoop namenode/tasktracker package-private fields for this, as when you listen on 0:0:0:0 that is what the address is. Now I look at the configuration properties, if it is a hard-coded address/hostname return that, if it is 0:0:0:0 return our primary hostname.

Next: have every node publish these values to the Anubis T-space when they go live, have something else block the worker nodes from starting until the masters have put up their addresses.

10Jun
Wed2009
Hardware for the yellow elephant

Also on the topic of Hadoop, is this nice new rack-mountable toy from HP, the HP ProLiant SL170z G6

That's not really a name that slips off the tongue, but here are some features.

  1. 1-2 Xeon (Nehalem? I've lost track of the numbering) CPUs, offering 4 or8 cores
  2. up to 6 SATA disks
  3. Up to 128 GB RAM; works out at 16GB/core
  4. Dual Gigabit network ports
  5. Two server boards per 2U chassis

If you are planning on building your own little Hadoop cluster, this is some kit to look at. No need for RAID or hot-swap, no need for enterprisey features, just the right density of CPU/RAM and HDD -including the right power density and weight.

Not had a chance to play with any yet, though I'd clearly like to. On that topic, we need better benchmarks than just Terasort. Stuff more CPU and RAM heavy, stuff with more cross references to other snippets of fact, stuff more Graph-like

Obviously I don't have the pricings -go talk to your HP account representative or reseller.

10Jun
Wed2009
the Yahoo! Hadoop distro

Nigel has announced the Yahoo! Distribution of Hadoop.

This is not a set of RPMs or similar, it is the source of the version of Hadoop that Y! use in production -essentially it is the trunk with various patches. You can build up the source yourself if you know which patches are in use, but here everything is, ready to go.

One caveat: don't think this means your cluster will work. Your network does not look like Y!s, your configurations will be different, things will behave differently.

Case in point, today's discussion on the user list about Hadoop on multiple NIC systems. Someone is trying to bring up a cluster where one Network card generates traffic you are billed for, the other network card is free. Guess which one it always seems to bond to?

9Jun
Tue2009
New Town bike

I have a new bicycle. It is for round town use. No suspension, everything is either a handme-down or something unused at home other than a brake cable and gear cable, and a front wheel I got on sale as it was flood damaged and hence already looked scraggly.

New bike

Having built it up, taped black insulating tape over every bit of component branding, to disguise the XT and Hope parts on it, I'm left with the problem that it still looks too smart. I am contemplating making it look really cheesy by cable-tying pipe insulation round bits of the frame, or even bits of old inner tube. As I want a bike that is still there when I get back from the shops, not something that has been stripped down for its parts.

New bike

While I was trying to do all of this, a small child I have never seen before kept interfering and demanding a water fight. Eventually I had to be ruthless and lock him out of the house, which upset him it a bit.

21May
Thu2009
SmartFrog 3.17.010

There is a new SF release, version 3.17.010

the "odd" number .17, means an unstable release. We aren't doing stable releases until the OSGi version is the main branch. This release is Java6+ and includes a version of Hadoop from 6 weeks ago with my lifecycle stuff patched in.

This means that you can use it to bring up Hadoop clusters in real or VMs, but that everyone but me will be scared of the bug reports. I'm running the tests against trunk with the lifecycle patches in; if that goes ahead there will be an updated release out in a week or two

Note also we've still got the LGPL flag set. Moving to an apache license is primarily the task of going through all the source and docs and patching it. That will take time, though IDEA has a copyright plugin that I may be able to use.

SmartFrog 3.17.010

This is a new release of SmartFrog, the Java-based, LPGL-licensed distributed deployment framework developed by HP Laboratories. SmartFrog enables applications to be deployed across multiple machines, configuring different aspects of the system so that they are all consistently configured, and managing the life-cycle of the application as a whole. The project's home page is http://smartfrog.org/

The release artifacts are available at http://sourceforge.net/project/showfiles.php?group_id=87384&package_id=176308

This release is 3.17.010; built from revision 7598 of the SVN repository.

It was compiled with the Java version set to Java 6 -it will not work on Java1.5 JVMs.

This release includes the following items:

  • Core smartfrog daemon, including services to manage files, start and stop Java and native programs.
  • Example components and applications.
  • Ant support: ant tasks to deploy and terminate applications from a build.
  • Ant components: the ability to execute ant tasks in a deployment.
  • Anubis: a partition aware tuple-space that can be used to implement fault tolerant systems.
  • Database: components to issue database commands, and deploy HSLDB and MySQL databases.
  • Hadoop: a version of Apache Hadoop with SmartFrog management, and support components to interact with Hadoop: filesystem operations and Job submission. This is still experimental and must be considered unstable.
  • JMX: the ability to configure and manage JMX components, and to manage SmartFrog components over JMX.
  • Logging: integration with Apache commons-logging and Log4J, and the SLF4J framework.
  • Networking: email, FTP, SSH, DNS support.
  • Quartz: scheduled operations using Quartz libraries.
  • RpmTools: components for working with or deploying with RPMs
  • Scripting: support for BSF-hosted scripting languages
  • Testing: Distributed JUnit and component testing with SFUnit.
  • Velocity: generation/transformation of text files during deployment
  • WWW: deployment of WAR and EAR files to application servers. deploy-by-copy is provided for all application servers that support, and a tomcat-specific component can communicate with Apache Tomcat.
  • Jetty: The Jetty component can configure and deploy individual servlets, eliminating much of the need for WAR files themselves.
  • XML: XML support with XOM.
  • XMPP: Presence and messaging over Jabber.

Packaging

This release is available as:

  1. RPM files inside a .tar.gz file.
  2. A JAR installer.
  3. A .tar.gz file containing everything needed to create a private set of RPM files containing JAR files signed by a private Certification Authority.
  4. The original smartfrog distribution as .zip and .tar.gz (deprecated)

The RPM installation is for RPM-based Linux systems. It comprises the following RPMs:

smartfrog The core SmartFrog distribution.
smartfrog-daemon The shell scripts to add the smartfrog distribution to the path, and to run the daemon on start-up.
smartfrog-javadoc javadocs for the project
smartfrog-ant Ant task and build file execution
smartfrog-anubis Distributed partition-aware tuple space
smartfrog-csvfiles CSV file support
smartfrog-database Database access
smartfrog-jetty Jetty support
smartfrog-jmx JMX integration though MX4J
smartfrog-junit Junit 3.8.2 test execution
smartfrog-hadoop Hadoop support
smartfrog-logging Logging through Log4J and commons-logging
smartfrog-networking SSH, SCP, FTP and email
smartfrog-quartz Scheduled operations
smartfrog-rpmtools RPM support tools
smartfrog-scripting Scripted components
smartfrog-velocity Velocity template processing
smartfrog-www Web support: Deployment and liveness pages
smartfrog-xml XML Support
smartfrog-xmpp XMPP/Jabber communications
smartfrog-xunit Distributed testing and reporting

All the JAR files are also published to a repository that is compatible with Apache Maven and Ivy. Add http://smartfrog.sourceforge.net/repository to your repository list to pull SmartFrog artifacts into your Ivy- or Maven- based build.

There are also SmartFrog components to retrieve artifacts from such a repository (the Library components under /org/smartfrog/services/os/java/library.sf ), which can be used for dynamic download of SmartFrog and other artifacts.

Security warning

Unless SmartFrog is configured with security, a running daemon will listen on its configured port for incoming deployment requests, and deploy the applications with the rights of the user running the daemon. When the smartfrog-daemon RPM is installed, that means that a process running as root will be listening on an open port for incoming deployment requests. Do not deploy SmartFrog this way on any untrusted network, not without turning security on.

There is a special distribution .tar.gz file that can be used to generate a private set of secure RPM files.

Building SmartFrog

SmartFrog requires Java 1.6+ and Ant 1.7+ to build.

The distribution does not include a source tree adequate to build the entire system. Please follow the instructions at http://sourceforge.net/svn/?group_id=87384 and check out smartfrog/trunk/core from our repository.

This release was built with revision 7598 of the repository, which is available under the SVN branch https://smartfrog.svn.sourceforge.net/svnroot/smartfrog/tags/release3.17.010

We strongly encourage anyone interested in building or extending smartfrog to get involved in the smartfrog developer mailing list, which can be found from the sourceforge project page http://sourceforge.net/projects/smartfrog/

Reporting Bugs

Please file all bug reports at http://jira.smartfrog.org/

The SmartFrog Team

Changes since last release

Bug

  • [SFOS-495] - for dynamic classloading, we need a way to set the security policy in the ant tasks, and a default one if none is provided
  • [SFOS-994] - Hadoop tests failing with namenode locked. Assumption: we aren't terminating namenodes properly
  • [SFOS-1013] - Test JobTracker isn't getting its file system URL right
  • [SFOS-1031] - TaskTracker not shutting down
  • [SFOS-1043] - CI tools are picking up the wrong version of the main distribution .zip file
  • [SFOS-1051] - Ant <start> tests failing on Hudson
  • [SFOS-1053] - the deploy target in common.xml should use the tests.run.classpath for its classpath
  • [SFOS-1056] - Assert component doesnt report resolution failures that well
  • [SFOS-1057] - Assert component reference resolution logic is wrong
  • [SFOS-1060] - ManagedConfiguration effectively discards all values it is set() with
  • [SFOS-1075] - Management console won't exit
  • [SFOS-1076] - restlet JARs are in smartfrog lib under SVN
  • [SFOS-1077] - Now that exits are being trapped, wherever in the code we call it intentionally, that operation now fails
  • [SFOS-1086] - Possible deadlock in DelayedTerminator
  • [SFOS-1098] - Add better classload failure details in SFClassLoader.forName()
  • [SFOS-1101] - wrong termination semantics in the HadoopServices
  • [SFOS-1106] - Hadoop JspHelper class does not load on Hadoop-under-smartfrog
  • [SFOS-1107] - Intermittent Test Failure in HadoopConfigurationTest. Race condition?
  • [SFOS-1112] - hadoop components/extras tests should be disabled when running on java5; even if the components build, the tests will fail.
  • [SFOS-1123] - move filesystem and job tests out of hadoop/test into src/examples so the same tests can be run on live clusters
  • [SFOS-1124] - Add component to do directory copy between any two supported filesystems
  • [SFOS-1137] - Hadoop can exit the JVM with a fatal error in the namenode
  • [SFOS-1145] - Unowned RPMs found in $SFHOME/lib after upgrades -from older packages
  • [SFOS-1150] - Have AbstractTargetedCondition attempt to resolve the target on demand
  • [SFOS-1154] - security exception in management console
  • [SFOS-1164] - Ant daemon tasks do not use the same security manager as the shell scripts
  • [SFOS-1171] - RMI security problems on Java6 with Hadoop
  • [SFOS-1178] - Ivy/Maven repository on sourceforge.net is not indexed any more
  • [SFOS-1181] - ant print-proxy-settings only works if the proxy settings are correct
  • [SFOS-1188] - 2 javadoc warnings in core/smartfrog

Improvement

  • [SFOS-14] - Improve scripting component
  • [SFOS-848] - make LoadClass a Condition, so it can be used to enable/disable tests
  • [SFOS-855] - Hadoop client-side components should be able to pick up properties from a cluster CD
  • [SFOS-1001] - have all hadoop services extract the port numbers from the configuration, and fail if they are in use before deployment, and not released at the end of the lifecycle
  • [SFOS-1025] - Useful additional constraint functionality:
  • [SFOS-1054] - improve testcase selection in common.xml
  • [SFOS-1059] - assert equality tests should .toString() their parameters for testing
  • [SFOS-1065] - Tweak to notification mechanism in SF Orchestration
  • [SFOS-1069] - Modify orchestration selection of pertinent members from arrays for dependencies
  • [SFOS-1070] - Orchestration to support new dependency types: on model state and "local" deps
  • [SFOS-1074] - add hadoop bundle to the installer JAR
  • [SFOS-1089] - Better diagnostics of why a Hadoop filesystem won't load
  • [SFOS-1090] - Move Hadoop service lifecycle from "terminated" to close; update state diagrams
  • [SFOS-1092] - hard code datanode ports into the datanode test templates, add checks that these ports are closed after the runs
  • [SFOS-1093] - Hadoop nodes should push out live http and other port bindings to the components
  • [SFOS-1096] - add test for dfshealth on the filesystem
  • [SFOS-1097] - LoadClass to provide better diagnostics when a class is not found
  • [SFOS-1102] - give components/machines the ability to override the default timeout (10 minutes) for <junit> runs
  • [SFOS-1105] - add way to set JVM args (like -verbose:class) onto the daemon that is created for functional testing
  • [SFOS-1111] - make LoadClass better at diagnostics by adding a resources[] list, and producing a list of all missing resources
  • [SFOS-1113] - Move the port checking code of HadoopTestBase into a new test base class in testharness, so that other components can check that ports are closed after a test run
  • [SFOS-1117] - Move DfsUtils use of DistributedFileSystem (i.e. HDFS only), to the FileSystem interface
  • [SFOS-1119] - add more checks to DfsPathExists
  • [SFOS-1120] - modify common.xml to run sfDaemon in the system tests from the build/test dir
  • [SFOS-1121] - subclass Condition interface with one that provides a reason for a condition failing
  • [SFOS-1134] - Datanodes and job tracker nodes to offer the ability to set their hostname dynamically, based on the machine doing the deployment
  • [SFOS-1135] - move hadoop-cluster components to using dynamically determined hostnames
  • [SFOS-1138] - Provide a better error message when a remote process isnt the right type
  • [SFOS-1141] - Remove warnings about headless mode from all the daemon logs
  • [SFOS-1168] - Switch to the exit trapping security manager
  • [SFOS-1187] - Add an environment variable option to name the security manager for the scripts

New Feature

  • [SFOS-875] - Write CheckPort component to extract hostname:port information from a hadoop component, and try connecting to that port
  • [SFOS-967] - Implement workflow component to decomission a datanode, with tests
  • [SFOS-1019] - Add smartfrog-hadoop RPM that contains all the artifacts needed to bring up Hadoop under SmartFrog
  • [SFOS-1035] - Add commons-pool and commons-dbcp JARs to sf-database component and RPM
  • [SFOS-1049] - have common.xml read in a properties file from ~/.ant, outside the source tree, so that options as to which server to deploy to get retained when the directory tree gets deleted.
  • [SFOS-1064] - add component to validate configuration values.
  • [SFOS-1079] - entry points could report network connectivity problems better
  • [SFOS-1126] - Add a configuration checker which checks the configuration of a cluster -or other component that provides a configuration- against a list of expected equality values.
  • [SFOS-1131] - Ant task to determine the local hostname/address
  • [SFOS-1133] - Add functions and components to work out the local hostname
  • [SFOS-1139] - Provide a component to check that a security manager is installed, and that exits are blocked
  • [SFOS-1147] - create new TextListFile component (extending TextFile), that creates the text for a file from an inline list; one line per list entry
  • [SFOS-1167] - ant component to list nested resource collections and print them to a property or to a file
  • [SFOS-1195] - add a ReferenceResolves condition that can be use to probe or poll for a reference

Task

  • [SFOS-210] - Add java security debug properties to User manual
  • [SFOS-1073] - Move up to ivy 2.0.0 release
  • [SFOS-1085] - extend the job submission test, list input and output directories and have it do real work
  • [SFOS-1103] - Move legacy dependencies from core to extras

Sub-task

  • [SFOS-788] - write component to submit jobs to a Hadoop cluster
  • [SFOS-859] - write components to perform filesystem create/delete/move operations, and copy data to/from the filesystem
  • [SFOS-1055] - write component to load in settings from hadoop's XML files
  • [SFOS-1151] - Update service to support TRACE level that is now supported in Log4J
19May
Tue2009
Tip: Update the laptop and desktop on different cycles

I finally managed to get VMWare to stabilise on the Ubuntu 9.0.4 laptop -upgrading the (closed source) ATI display drivers fixed things. So I managed to get a release of SmartFrog out on Friday/Saturday (more details another post).

Now I am back at work, catching up on 6 weeks worth of Hadoop diffs, reading the svn book and understanding the full details of SVN merge. It's clear that if you accidentally commit changes to the trunk that are then rolled back, svn merge removes them from the branch too -I am thus very grateful to the --record-only option to skip over the commit/rollback, but now have lots of comparison and tests to do. I also need to pull the Backup Namenode that went in into the lifcycle. One goal of having a lifecycle for Hadoop services is that you can subclass them safely; BackupNode does this in the way every softare engineering guideline says "dont" -by adding a virtual initialize() method to the base class, called from the base class's constructor, which the child class subclasses. Hence its initialize() method is called partway through the constructor process -this is something most Java best practise guidelines strongly discourage. Still, it should be easy to bring under control, and I can add more tests for it in the process.

But to do that, I need a working machine, and here the fun begins.

My Exchange mailbox was silent yesterday, which was odd, until I worked out that it meant the network on the VM was down. Reboot the VMWare XP image -and no VM. Down to safe mode. That works. Unsafe mode -bugcheck in a driver. Back to safe mode, pull off any drivers I put in while trying to debug the VPN-under-Vmware issues in the laptop. Of course, you cant uninstall things in Safe Mode, as the uninstaller refuses to work, but you can at least disable drivers from the device manager. This is enough to enable the VM to boot. At which point VMWare crashes with an "unexpected signal 6" message and some instructions to safe a log and that your issue may be looked at depending on your support contract, except I can't follow the instructions as X has lost keyboard and mouse control.

This is a hint: reboot time

Except the desktop doesn't come back up. Wierd. I have not upgraded it's ubuntu version, other than the usual fortnightly apt-get update, and yet it doesnt want to play. Hardware problems? Something on the motherboard that is causing XP/VMware to crash as a symptom? Possibly. But before I start pushing for a new i7 box, lets see what I can do. Turn the box off, leave it unplugged. This force resets all PCI cards and the RAM more reliably than a reboot, and yes, this time the HDD spins up and the host OS is live. But I'm still worried about VMWare. so over to VMWare.com to get VMWare Workstation 6.5.2 on this box.

Except today, the database behind VMWare.com is toast

Internal Error

The VMware download portal encountered an error while processing your request. 
We're sorry for the inconvenience. If you choose to report this error, please
provide the following error information:
Error code: QR04
Error date: Tue May 19 03:11:25 PDT 2009
Err 4 user: steve_loughran
Exception: JDBC exception on Hibernate data access; nested exception is
org.hibernate.exception.GenericJDBCException: Cannot open connection 

This is amusing. Hibernate error messages wrapping JDBC connection failures. One hopes VMWare have support contracts with JBoss and their database supplier. Perhaps also they could consider some HA hosting options for the database.

What it means for me is that I'm going to hold off bringing up any VMs on the desktop until VMWare.com is functional. That's a job for the laptop. I am therefore glad that VMWare is now working happily there.

This brings me back to the main point of the post. Having a desktop and a laptop not only means that I can work on the move, and in meetings, but I can view the failure of the desktop to work as an inconvenience, rather than a crisis. That is provided I upgrade the boxes on separate release cycles, and synchronise everything via SCM, Groove and such-like. Which I normally do, except when I'm trying to merge in 6 week's worth of Hadoop changes into a branch...

14May
Thu2009
Ubuntu 9.04: you will get IPv6; you will be grateful

I don't like IPv6; its aim in life is cause problems. I really don't like it when Hadoop hits IPv6 and then the jsp pages come up on a machine called 0:0:0:0:0:0:0:1. Getting rid of IPv6 is part of setting up Hadoop VMs, and it is the source of bugs like SFOS-1182 and SFOS-1163, leading to ideas to catch this early on, such as SFOS-1194 or SFOS-1183

Datacentres don't use IPv6 if they can help it

It's a shame, then that the only way to get rid of IPv6 in Ubuntu 9.04 is recompile the kernel.

I've done all the old modprobe tricks, and it is still there:

# ip a | grep inet6
    inet6 ::1/128 scope host 
    inet6 fe80::216:eaff:fec1:72f2/64 scope link 
    inet6 fe80::250:56ff:fec0:1/64 scope link 
    inet6 fe80::250:56ff:fec0:8/64 scope link 

Not good, not good at all. And I thought it was only RHEL/CentOS that forced IPv6 at me

14May
Thu2009
Ubuntu Linux 9.04

I've also updated some of my Linux machine to Ubuntu 9.04. The best feature so far is that OpenOffice.org now seems to autosave its dictionary, so if it crashes, new words you added still get remembered. Before you had to close OOo every time you added a new word to the dictionary, unless you believed the app would actually stay up long enough to shut down cleanly. Someone has been reading about crash-only-software, perhaps.

Crashes: all VMware related so far. So hard to blame the OS. Trying to bring up Windows Server 2003 and Linux VMs takes down the system; nothing in the log. Ouch. I am looking at VirtualBox everywhere as desktop alternative to VMWare, while downloading VMWare 6.5.2. Incidentally, firefox restarts downloads when it recovers from a crash. nice.

Other than that, no obvious changes. I'm still hoping to get the built in WWAN module to work, but not in this release.