CS462 Planet

August 21, 2008

Sam's Amazon Web Services Notes

Missing Pieces of Cloud Computing

Between Google's AppEngine and Amazon's Web Services platform, you can build almost everything you need to run a scalable system. The 'almost' is what is currently preventing these platforms from supporting truly beautiful cloud systems. I believe that the two most important pieces of Cloud Computing are scheduled processes and load balancing.

Scheduled Processes in a cloud computing application are necessary for tasks both simple and complex. On the simple side, a task can be used to send reminder emails to system users, or generate periodic reports based on system data. On the complex side, scheduled processes might communicate with a processing grid via asynchronous queues, or perform orchestration functions to manage a scalable computing cluster. The best provider of this service is likely to be Google's AppEngine.

During an AppEngine session at Google IO, the presenter did mention that offline processing was a planned (and often requested) feature, but they did not give details. Their offline processing may only be the ability to launch threads and return rapidly to the user, but I hope it also contains some mechanism for the timing of script execution.

In the meantime, it is possible to invoke processing scripts from an automated system, either running on hardware you own or a service such as WebCron, but it will be nice to have a more elegant solution.

Load Balancing allows for a group of computers to be joined into a cluster, allowing service of more requests then a single machine can handle. While AppEngine does not require load balancing to scale, clusters within EC2 need this functionality if they service public requests. Load balancers exist in both hardware and software form, and generally provide several methods of load distribution, system failure handling, and sometimes SSL termination for secure connections. Currently, software load balancers must be used to load balance within the EC2 environment, and at least 2 of these machines are required to gracefully handle machine failures.

Providing this service in a scalable way would greatly simplify the design of computing clusters within the EC2 environment. It would also make managing public IP addresses a little easier, as only the load balancing service needs a public address. Amazon is currently testing a form of persistent storage within their EC2 environment, and I imagine that a load balancing solution could be offered in a similar form. Amazon may decide not to offer this service, preferring 3rd party solutions.

In Summary, I look forward to the future of cloud infrastructure. I'm excited to be able to develop to these platforms, and I expect great things in the future from both of these companies and others as well.

August 21, 2008 04:14 PM

June 18, 2008

Sam's Amazon Web Services Notes

Needed: Mini EC2 instance sizes

Amazon Web Services Elastic Compute Cloud (EC2) currently has three instance sizes: Small, Large, and Extra Large, priced at 10 cents, 40 cents, and 80 cents per hour. Allowing different instance sizes is pretty powerful, as it allows some scale-up in addition to scale-out. Scale-up ends up being pretty useful running things like database servers.

What's missing is a mini instance. To run a scalable application on Amazon's cloud, an orchestration server is required to keep things running the way they ought to. The orchestration server is the system that keeps an eye on SQS queue lengths, launches or terminates EC2 instances, and provides uptime and availability checking. The processing requirements are minimal.

Running an EC2 instance 24x7 costs ~ $72 per month. If you want 2 servers (in different zones, of course) for redundancy, it's going to cost $144 per month just to monitor your system. Because of this situation, most people run their orchestration server on their own hardware.

If they allowed a mini instance with one quarter of a Small instance's resources and charged 2.5 cents per hour, it would cut the cost down to $36 per month for redundant orchestration services.

With their recent improvements to EC2, mini instance sizes would greater flexibility and control of systems built on the cloud.

June 18, 2008 05:14 PM

Amazon EC2 Persistent Storage

Amazon announced what appears to be a private beta of a new extension of their EC2 and S3 platform: Persistent Storage. You can create a drive in any size between 1 GB and 1TB, and then mount it as a drive onto any single EC2 instance. Although the virtual machines already have storage available on the instance itself, this allows virtual machines to access quite a large disk space. Restricting the volume to a single EC2 instance eliminates the scale/concurrency problem, so no eventual consistency requirements are placed on this volume.

In a very Amazon 'building blocks' way, they create the volume as an unformatted drive, allowing you to use any filesystem you would like. Further, they have built the system to allow fast access between the volume and EC2 instances.  This portion of their puzzle is clearly built on a high-performance SAN well connected into the EC2 network.

This type of storage solves two problems with EC2: Needing more then the provided drive space, and eliminating lost data problems caused by vanishing instances. They mention that this type of storage is perfect for running a database server.

I like the idea, but I need a few things cleared up first. A database server needs to be redundant. How do we configure the system to move the mounted drive to the hot spare instance in the event of a failure? Detecting a failed instance, starting up a new one and reattaching it to the existing volume seems to make sense, but will require some downtime.

It will be interesting to see how this plays out.

June 18, 2008 05:14 PM

Amazon Web Services SQS Price Changes

Amazon Web Services announced a price change for SQS today. They changed prices, but also how the prices were charged. Some are claiming huge savings. I did a little math to see who it would affect and how much.

The price changed from $0.10 per 1,000 messages to $0.01 per 10,000 requests. Charging by request means that pushing a message and asking for messages counts as two requests. You can make 100 requests (new pricing) for the cost of 1 message (old pricing). This means that if you poll 99 times for every message you pass through the queue, you will pay exactly the same amount.

If you poll once every 5 minutes, 3 messages per day will have you break about even. Polling once a minute, it takes 15 messages per day to break even. Systems of this size will not see much change. The monthly fee with 15 messages/1 minute polling is only 5 cents (not counting bandwidth) anyway.

Larger systems will see a much bigger difference. If you have 10 servers polling a queue once per second, you will need 8640 messages per day to break even. But seriously: Your monthly fee for a system like this? $26.44 (not counting bandwidth).

In either case, clever polling schemes (fallback, based on time of day, etc) could save some money. If the other end of your queue is a scalable EC2 layer, and you only run servers when there is stuff in the queue, then your savings are likely to be pretty good.

So in any case, it's all still pretty cheap.

June 18, 2008 05:14 PM

Teaching EC2 Instances to self-update DNS

Setting up a website on Amazon's EC2 requires modifications to DNS Entries each time the machine instance starts. On each boot, the instance is assigned a new public (and private) IP address. In order to use a web server or a load balancer, you need to re-point your DNS records at the new machine. I use Nettica as a DNS provider for reasons of both cost and flexibility. Nettica has an API that can be used to update DNS entries. I googled around and found a library for python that did the job, but it seemed like a bit of overkill. After a bit of work, I came up with the following Python script:
import urllib
#get this machine's public ip
publicip = urllib.urlopen('http://169.254.169.254/2007-03-01//meta-data/public-ipv4').read()
#register with nettica
args = ('foo', 'bar', 'subdomain.domain.com', publicip)
updateresponse = urllib.urlopen('https://www.nettica.com/Domain/Update.aspx?U=%s&PC=%s&FQDN=%s&N=%s' % args).read()
After importing the urllib library, I use a GET request to find my current public IP using a service that EC2 provides. In this format, it returns the ip address as a string, making it easy to insert it into my second GET request, which calls a simple web service provided by Nettica. There are a few important things to note about this setup. In order to use the simplified web service endpoint, I can only update an A record, not a CNAME. This requires me to use the public IP assigned by EC2, not the public DNS name that they provide and recommend. It is also important to set the TTL very short, to restrict caching within the DNS system. Nettica has a 'dynamic' setting just for this purpose. Nettica does provide a very rich SOAP interface that allows for much more then updating an A record IP address. The library that I mention above uses this rich API to allow listing, adding, and updating, and deleting DNS records. I chose my solution for simplicity, not for completeness. To have the instance self register on startup, we must invoke this script on boot, and I'll be blogging on that soon.

June 18, 2008 05:14 PM

Building a Scalable, Distributed system with Undergraduate Students

This semester is my second as a TA for CS462 - Large Scale Distributed System Design. My graduate advisory, Phil Windley, asked me to be a TA again a month ago or so, and we started talking about things we'd like to do different. We came up with a pretty cool plan. [More]

June 18, 2008 05:14 PM

Getting Started with Cheetah

Cheetah is a template engine for Python. Getting started with it is pretty easy, and probably best described with an example. First, the python script: from Cheetah.Template import Template def index(name=None): t = Template(file='/var/www/html/main.tpl') t.title = "CS462 Class Lab 1" t.student = "Sam Curren - TA" if name: t.message = "Hello %s" %name else: t.message = "Hello World" return t Next, enjoy main.tpl: $title

CS462 - Web Server

This is the page of: $student
WS Technology: apache/python

Message: $message
I only did some basic replacement, and Cheetah can do much more. Have a look at the main documentation.

June 18, 2008 05:14 PM

Designing an Interface - SimpleDB CFC

I'm working on a ColdFusion library for Amazon's SimpleDB, and I'm mulling over the options for how I want the library to work. I've completed the authorization code and the methods that deal with creating, listing, and deleting domains. They exist well as function calls, with the listing method returning a query object for easy iterating.

The two pieces that remain are the methods that deal with items (records within the domain) and queries against a domain. Here are a few options that I have, and I'd love some feedback!

Item Methods

SimpleDB allows you to add attributes to an item and remove them, as well as retrieve all (or a subset) of the attributes of an item.

There are two main approaches here: Present an Item object with methods, or present the methods within the main CFC itself.

Presenting an Item as an object would work like this:

item = simpledb.getItem(domainname, itemname) item.setAttribute(attributename, attributevalue, replace=false) item.deleteAttribute(attributename) foo = item.getAttribute(attributename) item.persist()

Presenting methods from the main cfc would look like this:

simpledb.setAttribute(domain, item, attribute, value, queue=false) simpledb.deleteAttribute(domain, item, attribute) foo = simpledb.getItemAttribute(domain, item, attribute) mystruct = simpledb.getItemAttributes(domain, item) simpledb.persistChanges(domain, [item])

In either case, changes could be queued in the system until persist() or persistChanges() was called. This way, multiple attribute sets could be combined together to speed processing.

The examples I provide here are a bit rough, but I'm sure you get the idea.

Which of these options is going to be more simple / coldfusion like?

Queries

The SimpleDB API returns a list of itemnames to queries. This means that to access the data, you need to query the attributes for each item. The simplest option would be to return the list of itemnames, and allow use of the other methods to get the desired attributes.

A more full-featured interface would have the option to get the attributes for each item and form it into a query. An iterator object could be another option, with either attribute access methods or simply returning an item object similar to the one mentioned in the item section.

Plea For Thoughts

So, does this rambling question make any sense? Which style (thin and rough, fat and friendly) of API interface do you prefer?

If my thoughts need to be better formed before any worthwhile answer can be given, please let me know.

June 18, 2008 05:14 PM

Rank the following: AWS Questionnaire

I just participated in a questionnaire for Amazon Web Services, and was so amazed by the elegance of one of the questions that I felt it warranted a blog post. I'm sure this method of question has been in practice for quite some time, but this is the first time I've encountered it. Have a look at the question, and response section: The question asks the user to rank the following options. Usually, this requires the use of dropdown boxes or something more complicated. In this case, they present a grid of checkboxes. Upon checking a box, they have some javascript code that disables the other checkboxes in the same row and column. The result is a very simple control for specifying a ranking preference that even my Mother would immediately understand. Well done, AWS.

June 18, 2008 05:14 PM

EC2: Running scripts on startup and shutdown

Nearly every time you want to use an EC2 server, it is convenient to have a script to run on startup or shutdown. You can update DNS preferences, connect to a load balancer, or download new updates from a code repository. I use the following script, saved as /etc/init.d/ec2 #!/bin/bash # # ec2 Startup script for EC2 machines # # chkconfig: 345 99 02 # description: Script used to issue startup and shutdown commands. # if [ "$1" = "start" ]; then python /path/dnsConnect.py exit fi if [ "$1" = "stop" ]; then python /path/to/script.py exit fi You will need to set some execute permissions (check out the other scripts in the same directory, and also run the following commands at the prompt: chkconfig --add ec2 chkconfig --levels ec2 reset You can execute the script directly with "start" or "stop" (sans quotes) after the script name. This allows for testing, and to make sure you are calling your scripts correctly.

June 18, 2008 05:14 PM

DataCenter

This note is just for the 462 Students:

 

Barry Dixon is expecting you about 4:30 at the ViaWest data center next Wednesday (Dec 5). You're responsible for getting yourself there and back. Feel free to carpool or whatever. The data center is located in the Canopy building at 333 S 520 W, Lindon, Utah 84042. Take exit 273 from I-15 and its just east of the freeway north of Home Depot. Here's a map. The tour will take about an hour.

June 18, 2008 05:14 PM

Goog-411, Checkout, and Amazon Web Services

When Google announced their 1-800-GOOG-411 free business lookup service, I realized pretty quickly that their purpose was to collect a huge volume of voice data for text-to-speech training. I didn't post about it at the time, so you are going to have to believe me. Or perhaps not, as Google Blogoscoped reports that Marissa Mayer claims that to be the case.

Is it unheard of for a company to set up a service for a purpose other then making money directly from it? Consider Amazon Web Services. They offer storage, computing, and other infrastructure services, and you pay for the use of these services. But does Amazon actually make any profit from these services? They do, but the real advantage for Amazon is in their cost savings by offering the services to customers.

Huh? Cost savings from offering a service?

Consider all the additional bandwidth consumed by users of the S3 and EC2 services. Since users pay for the bandwidth they consume, it doesn't cost Amazon much to move the traffic through their network. What it does do for Amazon is qualify them for cheaper bandwidth from their service providers because of the additional volume. That's right: by using S3, you are saving Amazon money on all of their own bandwidth.

I'm told that the savings caused by Amazon Web Services in such a way far outweighs the profit from the actual services themselves.

I believe that Google built their Checkout platform for precisely the same reasons. While I'm sure that Adwords also had something to do with it, I think the main purposes of Checkout was to draw more credit card transactions, allowing Goggle to qualify for additional discounts on their own transactions.

Amazon's Flexible Payment Service was probably built for the same purposes. Smart I tell you. Offer an awesome service that helps other people save, and then leverage their use of the services to make (save) additional money.

June 18, 2008 05:14 PM

Amazon EC2 Welcomes Paid Platforms

Red Hat Enterprise Linux, a paid software platform, has recently announced a private beta of a service that allows users to deploy their software onto Amazon's EC2 computing cloud, paying $19 per month plus hourly usage fees. The fees are in addition to the computing time and bandwidth charges paid to Amazon.

I could not be happier to see this happen. I'm a huge ColdFusion fan for web-app development, but I'm struggling with the prospect of using ColdFusion for a system that I will need to be able to scale in a way that EC2 provides. Red Hat will lead the way, and hopefully many others will follow with similar instance/hour payment options.

If Adobe allowed such license terms, it would allow me to write web applications completely in ColdFusion, and then deploy code either to a regular licensed production box or onto a computing cluster when demand calls for extra computing power.

June 18, 2008 05:14 PM

JavaScript Driven Load Balancing

Digital Web magazine has an article on the idea of using JavaScript on the client side to manage the load balancing of a server cluster without an actual load balancer in place. This is a very cool idea, particularly for virtual server farms such as Amazon's EC2. If you introduce a load balancer (hardware if possible or software), you must use more then one in a cluster configuration. If we can eliminate the need for a traditional load balancer by writing intelligent clients, we can have a much more scalable, reliable system. What I wish for is a nicer way of doing cross domain Ajax calls. Flash uses a crossdomain.xml file, and I think something similar would benefit JS based clients in a very nice way.

June 18, 2008 05:14 PM

Adobe's Services Play

Today at Adobe's MAX Conference in Chicago, they highlighted in a general session some new services that Adobe is getting into. Adobe has long been a software company of the install variety. They announced some services that lead them into a very different model as a company. These new services will likely never be the core of Adobe, but do help it expand into new areas of business. The most interesting service is what they introduced as CoCoMo. The services expose all the underlying features present in their Connect application as Flex components that can be built into any Flex application. This allows integrating conferencing, audio, video, whiteboard, and screen sharing into any application. They appear to be billing this similar to their current model. The components must connect to a Connect room, which could likely be used with their current interface as well. They also announced a different product called Pacifica that seemed to have the same set of features, but somehow was listed under a different product name. I'm very confused on the difference between these two products. In a different space, they released SHARE, an online file sharing service with 1GB of space for free. You add files (of any type) into the system, and then you can share them with anyone or only specific users. I couldn't find the ability to update a file by uploading a new version, but did see several places list both a Created and a Last Updated date for the file. They do have REST APIs for the entire service, so you can build a front end onto this, or mash it up with a completely separate service. I expect to see both an AIR version (with drag and drop upload and download) and file system extensions to have this space mounted as a drive very soon. I am very much a fan of Amazon Web Services. Amazon's services are very aligned with their core competencies of infrastructure and scale. These services (with the exception of the SHARE service), play very well to Adobe's strengths. I'm excited to see more companies entering the service/infrastructure business.

June 18, 2008 05:14 PM

May 06, 2008

John Dusbabek

December 14, 2007

Phil's 462 Stuff

Amazon's SimpleDB

I just posted piece at Between the Lines on Amazon's latest announcement: SimpleDB, a database service in the cloud. I gave it the title "Economics that are impossible to stop" because that what I think Amazon's doing: changing the whole economic model of how people build large scale distributed applications.

December 14, 2007 11:28 PM

Scott Bong-Soo Chun

Final Notes

Since my last post, I discovered a few errors in my Lab 3 & 4. The error reporting from the submit server (lab4) was not according to specs. Therefore, lab3 did not function for error conditions. Once I fixed that, I discovered that the format of the actual message was also not according to specs.

The lesson to be learned here is 1) read the lab specs often and 2) to stick to the specs even if you see something better. The mindset which is needed in enterprise computing is that you are building components that will function in a collective. Therefore, there is an intrinsic coupling which cannot be broken if you intend to function with your neighbors. This coupling is the interface.

My errors weren't because I ignored the requirements, but that I am so used to being autonomous that it did not occur to me that I am dependent on other systems or that I support other systems. Once you understand the collaborative nature of the pieces of the enterprise solution then your approach to solutions will reflect the entirety of the solution and not only the piece you are working on at the moment.

I still believe Java was great to use because it taught me skills and methodologies specific for my job. But if you do not have this requirement, PHP & Python may be a better solutions because 1) the Python libraries are more useful and 2) it is more web-centric versus Java which is more processing centric.

by ChunSB (noreply@blogger.com) at December 14, 2007 11:12 PM

Mike Heath

CS 462 - Lab 5

My Lab 5 is finally done. Apparently I’m the last one to be getting it passed off. I guess that’s what happens when you have deadlines at work, a thesis proposal, a family…

I really enjoyed Lab 5. It took far longer than it should have for me but that was due mostly to playing around with all the cool GUI features of NetBeans 6. I really like the support in NetBeans for running background tasks. It’s a really powerful framework. And the GUI designer in NetBeans is, of course, second-to-none.

Building the WHOIS web-service was trivial. NetBeans did all the work.

Interacting the Amazon’s SQS service was a bit challenging though. I’m using the Typica library from Google Code. It sucks. I was originally using version 0.8 which insisted on using Base64 to decode SQS messages. Fortunately version 0.9 came out not too long ago and that version seams to work well. It still tries to Base64 decode SQS messages by default, but fortunately you can turn it off. Typica has a number of dependencies from Apache Commons and other places that it doesn’t seam to document at all.

Typica has potential I think but it’s certainly not there yet. I should follow Amazon Jeff’s advice and go do a write up about it at the AWS web site.

Other than that, this class was fun. I can’t say that I learned that much new about distributed systems but playing with EC2 was a lot of fun and made the class worth it.

by Mike Heath at December 14, 2007 03:08 AM

December 12, 2007

Richard Duncan

Lab 5 Finished

Now I hope I have everything working (it seems to be). I would have to say that this lab gave me the most stress. It might be due to the end of the semester as well, but I had a lot of difficulty getting this lab to work and being able to port it to another machine (more on this later). I'm mainly posting most of this information for my own benefit. So, if I ever do something like this again, I know where to look.

First of all, I chose Java to implement the approval client. I started writing it out in Swing, and thought there's got to be a better way. I noticed that some of you used NetBeans, so I gave it a try. I love NetBeans 6.0; it is extremely easy to create the layout for the UI.

After I built the UI, I tried tackling communicating with Amazon SQS. It was tricking finding a library that would work for me, but I found typica quite useful. This site tells you the jar files needed and gives an example of accessing the SQS queue. I incorporated the jar files into my project, and typica worked wonders.

Then I needed to parse and create XML for the ideas. I used the DOM feature. It might not be the best way to do it, but recently I have been programming in Javascript to open an XML (I wanted on offline search feature in HTML/Javascript). I used DOM plenty of times in Javascript. The javax.xml and DOM library acted very similar. So, it was quite easy to implement.

Lastly, I tried to implement Whois feature. This caused me the most pain. I downloaded Eclipse and Apache Axis 2.0 WSDL2java generator. I was able to generate java from the wsdl file, but I could not figure out how to use those java files. I tried some other wsdl/soap libraries, but had trouble using those as well. Then I tried implementing my own version of soap, by creating an http connection and sending XML. That failed as well. Then I looked for ways to use wsdl in NetBeans. And I found one. Right click on your project, create new web service client, and there's an option for WSDL. Create a packet name for the Java files, click finish and BAM! you have complied Java files.

To get whois info, I used:
Whois wi = new Whois();
WhoisSoap wisoap = wi.getWhoisSoap();
whoisData = wisoap.getWhoIS(domain);

And then I had my client working. . . WRONG! It only worked on my on computer through NetBeans. I got the error: "java.langLinkageError: JAXB 2.0 API is being loaded from the bootstrap classloader, but this RI (..) needs 2.1 API," when running from command prompt or another machine. It took me forever to solve this problem, and I couldn't bring my computer in to pass off since it's a desktop. Anyways, here is the solution:
  1. Download jaxws 2.1 if you haven't done so or isn't already installed. It would be in under java1/modules/ext in NetBeans directory if installed.
  2. Copy the folder api in jaxws21 to your project's directory.
  3. Then add the following to the java -jar command:
    1. -Djava.endorsed.dirs="[project_path]/api"
  4. What that does is it overrides the classbootloader to load JAXB 2.1 instead of 2.0.
And there you have it.

by Richard Duncan (noreply@blogger.com) at December 12, 2007 10:05 PM

December 07, 2007

Jay Liu

Figured it all out, and just in time too


Now I realize how much I didn’t know about the labs in this class.

 

Before, there was so much I didn’t know, I didn’t even know I didn’t know many things.

 

Especially after putting all this time into these couple of labs, I wish that I actually get to implement something like this again in the future, for some greater benefit. Amazon has some great technologies, and it was fun to play around with these things. After successfully using SQS and EC2, I have come to a greater appreciation for the power behind these technologies. I’m glad that I have had experience building the components of a multi-tiered system. It’s just something that I haven’t done before. It’s cool to play around with configurations (install other people’s stuff and enslave that stuff to make it do what I want!)

 

Anyway, enough on that; moving on.

 

I’m glad I had a couple of the following tools to help me out. I suppose the lesson learned that I want to communicate is that, one should find the technologies that not only do the job, but have the supporting code and other frameworks that shorten development time (in this case, shorted down from many many many hours to just many many hours ;) )

 

Here are some examples:

 

CodeIgniter for PHP. This helped with forms, validation, application properties, and directory structure. The only place that this bit me was something very specific to this technology, and I will save telling it, so that those who don’t care about learning the specifics of this framework can keep reading.

 

HTTP PECL for PHP made anything http-related turn into easy one-liners in code. Any HTTP_POST, and HTTP_GET that I had to make in Labs 3 and 4 were simple as pie. Also, in Lab 3 I had to parse the response for the submit service, right? I just used HTTP_Parse_Response to take the output from the HTTP_POST to get the “OK” out of there. Awesome.

 

NetBeans 6.0 for Java. I decided that I would do the Lab 5 stand-alone application in Java. At first, I was attracted to C# because I already knew how to manipulate XPaths and XML in there. Also, there is the whole .NET thing to hold my hand throughout the development process. However, the age-old conflict between Microsoft empire and the Open Source world made things difficult. There needed to be some sort of certificate conversion to access AWS, I think. Rather than bother with all that (I did attempt it once, but ditched the effort when I could see it being trouble), I aligned the Open Source technology of Amazon with the Open Source-friendly language of Java. I was familiar with NetBeans 5.0, but had no clue that 6.0 would be that helpful! It had a visual GUI creation environment. In fact, there was a template desktop form with a status bar and menu all set to go. I also found out there was a wizard to add a WSDL. I pointed it to the one given by the lab spec, and it automatically integrated it with the rest of my app, and even put in some basic code to help me interface with it. After working on Lab 5 for about a half a day, I was good.

 

AWS and accompanying classes to help interface. I think because of the fact that it was Amazon, luckily for me, there were already classes written in Java and PHP that interfaced with SQS: Polar Rose in Google Code for Java, and Test Utility for Amazon SQS for PHP. I didn’t even realize at first that I would need something to interface with the SQS, but then I figured it out, installed the classes and some of the dependencies (some required stuff on Apache commons, which place I didn’t even know existed), and finished the coding. Well, now here I am with all my labs done.

 

 

So, this has been quite the journey. Just yesterday, I didn’t even think that I would be able to finish all my labs, because for a long time, I didn’t even know where to start. My head would hurt just knowing there was too much that I didn’t know, and that it would take a chunk of time to figure things out bit by bit. I think my appreciation for the programming community as a whole has grown. I couldn’t imagine trying to figuring things out and THEN write these things from scratch. Just understanding what a solution would look like was resource-consuming enough alone. Implementing the solution from scratch for one lab might have taken a semester, given the coding, testing, and debugging.

 

by cyanos at December 07, 2007 04:29 AM

December 06, 2007

John Dusbabek

Final Thoughts - The Labs

I'd just like to document some of the final thoughts I've had about these labs that we've done this semester. This is more for the benefit of those taking the class in the future, specifically those who are doing the Amazon EC2 labs.

One thing I did, which has saved me who-knows-how-much time was using a dynamic dns service for my servers. I had one for each of my three major servers (web server, listing server, submit server) which allowed me to hard code my calls during testing. Not only did I not have to refresh as much to get my server, but it also saved me when the submit service load balancer started having problems during the past two (or more?) weeks. Scott Chun has a good description of how to set up your server to register with dyndns.com's dynamic hostname service. Read about it here.

One thing I would have done differently would be to store those URLs in a config file (elementary 240 stuff) so I'd only have to change it in one place to make the switch from hard-coded to load-balanced.

Something that was mentioned several times in class, and I would still love to see a tutorial on how to do this, was an alternative to frequent image persisting. Especially for minor script changes I didn't realize I needed until having started the persist process, heaven knows I've had more of those than I needed. The solution is to have your server automatically check out the files it needs from a CVS/SVN repository on startup. I'm only an amateur shell scripter, but I assume there are two things the script would need to do. 1) check out the files. 2) make sure they have the correct permissions. Anyway, this would have been a real time saver.

It would have been fun to get a little experience with Pound in the labs. That's the only thing in the entire process that I don't feel I could go off and do right now. Sam says it's pretty simple to figure out, I guess I'll be finding out in a couple weeks when I'm off on my own to try it.

This class has really been an enjoyable experience. Earlier in the semester when I gave up on Python, I thought I'd feel some remorse at the end for not having stuck with it. Well, I don't. The architectural concepts the labs have illustrated really do transcend the languages used, and I'm glad I didn't get so bogged down in the language that I missed the point (that's the reason for scraping the EJB labs, right?). One thing I do regret is that Sam isn't making us use a template for our demonstration for Jeff Barr. Speaking of that, Nathan, you'd have gotten my vote for best design. Did anyone else have a design for consideration?

by J Dusbabek (noreply@blogger.com) at December 06, 2007 04:54 AM

December 05, 2007

Scott Bong-Soo Chun

Lab 5 Postmortem

Well, my first mistake was electing to write the client in Java+Swing. That took a bit longer than perhaps a Flex or Web approach. But, with a nice Eclipse plugin from Instantiations called WindowBuilderPro, things didn't get as messy as writing pure Swing code.

Once I had the shell of the client done, it was time to retrieve the submissions from the SQS queues. Amazon recommends using Typica, a Java-based lib for manipulating SQS. All went well, except for the fact that nobody else could read my submissions. It turned out that there was a bug in Typica rev. 0.8 which caused the submission to be encoded in bin64. I fixed that bug and now have a custom copy of Typica!

With that done, I next tackled the WhoIS web service. Eclipse also has a plugin called SoapUI that works beautifully to generate Java classes from WSDL files. Actually, Apache Axis does all the work (WSDL2Java), but the SoapUI Eclipse perspective makes the work easier. All this took about 3 days.

All looked ok except for the fact that I was always hitting my list, webapp and submit servers. When I checked, to my chagrin, I have all my URLs hardcoded to my servers. When I pointed them to the load-balancers, all heck broke loose. Now the real fun began. Testing became difficult because others were taking my submissions. When I made a change, there was no guarantee that I would hit my server. So, I externalized all the links to properties files so that I can update at runtime.

It took me an extra day to get things in order. Now I am at the state where when I am hardcoded, all works, but when I am connected to the collective, the submissions do not enter the queue. When I examine the output of the sslb loadbalancer, I get a screen-full of Python errors. Will update y'all when I finally get things working end-to-end.

by ChunSB (noreply@blogger.com) at December 05, 2007 07:13 PM

Chris Ellsworth

One more problem...

One other problem I had with lab 5 was with the network I was attached to. I noticed that when I would ask for new content from SQS, I would get the same content back again and again. I remembered that the CS network has a very aggressive cache. I switched off the CS network onto the BYU network and it magically started working again.

by Chris (noreply@blogger.com) at December 05, 2007 04:23 AM

Hilton Campbell

Lab 5

Doing a thick client application for lab 5 was a breath of fresh air. I did mine as an Eclipse Rich Client Platform (RCP) application in Java 5. I used polarrose-amazon for accessing SQS and Apache Axis for the whois SOAP service.

I didn't encounter any major problems writing the application, but I was unable to pass off the lab with Sam the first time. Both times he logged into the application it complained that the SQS queue had expired. Rather than waste his time debugging it then and there, I left planning to fix at home and bring it back later. When I tried it at home it worked perfectly though. The next day I brought it back to Sam to try passing off again, at which time I realized why it hadn't worked the first time: I hadn't signed in to the BYU wireless network. Dumb. I signed in before attempting to pass it off and it happily worked.

by Hilton (noreply@blogger.com) at December 05, 2007 12:21 AM

December 04, 2007

Chris Ellsworth

Lab 5 oh my

So this lab was especially frustrating for me. To do the desktop, I figured I would use C# since it is so quick and easy to make an application. The GUI was simple and fast to make. After that is when it started to get bad.

I tried to use C# to access the Amazon SQS queues. That was less than pleasant using C#. I had to install multiple packages, get additional dll's, and install certificates. Somewhere in all that, I couldn't get it working.

The soap interface for C# didn't seem to friendly either. After the previous defeat, I gave up on this one even quicker.

But alas, I did get it done. For the complicated SQS and SOAP, I run PHP scripts. I figured out how to do basic things in PHP for SQS last lab and the same library I used then worked perfectly here also.

To do the soap, I had to install an additional package:

yum install php-soap

After that basic calls like John described. One problem I had though was after making the function calls, I didn't know what methods or attributes the resulting object had so I could print it out. Two functions that I found were useful in figuring out what the vars were are as follows:

To get the methods

$arr = get_class_methods(get_class($obj));
foreach (
$arr as $method) {
echo
"\tfunction $method()\n";
}


To get the object variables


foreach (get_object_vars($obj) as $prop => $val) {
echo
"\t$prop = $val\n";
}



The rest of the lab were pretty straight forward.

by Chris (noreply@blogger.com) at December 04, 2007 05:18 AM

John Dusbabek

Lab 5: Finishing Touches

After nearly a month (of scattered work) I've finally put the finishing touches on my approval client. If anyone wants to get more experience dealing with asynchonous web applications, I'd recommend Flex. You could experience asynchrony with Ajax, but as someone who's used both considerably... I just find I have more time for fun when I'm programming in Flex.

Anyway, the link to the online version of my approval client is here: http://wishlist.dusbabek.net (same link as before).

I don't have any final thoughts to share about this lab, per se. In the future I would like to explore the scalability of Flex applications in a little more depth. Flex apps compile into SWF files, which can get reasonably large depending on the application (several hundred K to a couple megabytes).

A couple thoughts I've had on this:
1. Decrease the file size: don't embed. It's a common practice to embed all resources necessary (including some images) some of which may not be required immediatly.
2. Decrease the file size: Break into smaller SWFs. Flex makes it possible to load other SWFs at run time. Rather than compile all functionality into a single SWF, it could be broken into smaller functional applications that could be loaded lazily.
3. Reduce bandwidth on data transfer. Flex has no means of accessing a relational database directly, all data comes from either static XML files or web services (using the broad sense of the word). The amount of bandwidth needed could be reduced by using a lighter data format like JSON for RESTful services; using a binary format (like AMF); or by serving data from static XML files where appropriate.

These were just the first couple of more obvious things to occur to me. It've got 2 medium to large scale Flex applications I'm working on at the moment, and it'll be interesting to see what I can come up with.

by J Dusbabek (noreply@blogger.com) at December 04, 2007 04:05 AM

December 01, 2007

John Dusbabek

Lab 5 : PHP and SOAP

The only SOAP requests I've ever made were made on the .NET platform. They're not that much of a beast on .NET, but it wasn't exactly a cake walk either. So I had been bracing myself for the worst trying to implement it in PHP.

I should explain that my lab 5 client connects to a PHP service that in turn makes the SQS requests, etc. I initially wanted to implement an SQS library in Actionscript (and probably will in the future when I'm not pressed by deadlines) but I decided it was too ambitious for the amount of time I wanted to spend on this lab. So alas, a PHP service also handles my SOAP request to WHOIS.

Anyway, I was expecting SOAP on PHP to be a seriously complex affair. Here's my code that makes the request:


$client = new SOAPClient("http://www.webservicex.net/whois.asmx?WSDL");
$params = array('HostName' => $_GET['url']);
$whois = $client->GetWhoIS($params);


Granted, it would have required about 2 more lines if there wasn't a URL to the WSDL, but it doesn't get much simpler than that. I should mention that this requires that PHP SOAP be enabled (uncomment a line in your php.ini if you're running Windows; recompile from source using 'enable-soap' if you're running Linux). I didn't have to recompile, thanks once again to Remi Collet (the French guy who has yum rpms for all this stuff, see my previous post).

Well, the SQS library I'm using is pretty old and doesn't have a means of querying the queue for the number of messages. So, I thought I'd try sending a SOAP message to Amazon to get it. Amazon's WSDL is a little more complex, and I probably could have gotten it to work if I wanted to play around with the messages for another hour or so. It turned out to be a miserable failure, and I resorted to my old tricks: (file_get_contents()) which worked perfectly. Here's the code I used, which shows the query string needed to get the number of messages:


$timestamp = gmdate('Y-m-d\TH:i:s\Z');

$qs = "http://queue.amazonaws.com/A3N3IV5XJH079S/processing" .
  "?Action=GetQueueAttributes" .
  "&Attribute=ApproximateNumberOfMessages" .
  "&AWSAccessKeyId=[AMAZON_ACCESS_KEY]" .
  "&Version=2007-05-01" .
  "&Timestamp=" . urlencode($timestamp) .
  "&Signature=" . urlencode(constructSig('GetQueueAttributes' . $timestamp));

$response = file_get_contents($qs);


The constructSig is the same method I listed in a previous post.

Here are a few links that were helpful:
SQS Query and SOAP API
Getting SQS Attributes
SQS WSDL

by J Dusbabek (noreply@blogger.com) at December 01, 2007 09:28 PM

Lab 5 : Web App to Desktop App using Flex 3

I'm almost finished with lab 5. On the whole it's been pretty fun, aside from some of the frustration from minute details that take an hour apiece to hammer out. I had developed most of my application as a web app before the specs came out. Fortunately I was using Flex, so let me show you how easy it was to convert it from a web app to a desktop app.

As a web app, the main page was enclosed in elements like these:

<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute" backgroundColor="#2C3552" xmlns:local="*">
.
.
.
<mx:Application>


To deploy it as a desktop app, I had to change it to:


<mx:WindowedApplication xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute" backgroundColor="#2C3552" xmlns:local="*">
.
.
.
<mx:WindowedApplication>


and then recompile. And that's it. And this may appeal to those of you with high design sensibilities-- it looks the same on the desktop as it does on the Web. Incidentally, on the web it looks identical on every browser/platform combination (any platform that has a Flash player, that is).

by J Dusbabek (noreply@blogger.com) at December 01, 2007 09:03 PM

November 27, 2007

Hilton Campbell

Lab 4 Complete!

In previous blog posts I declared that I was done with a lab, only to discover that I wasn't. Well this time I waited to pass the lab off before claiming it was so. With a sigh of relief I can now exclaim, "fin!"

For the most part this lab wasn't too complex. I used the Voidspace Python module for Akismet and Python-Amazon for SQS. The tricky part was making sure that the SQS code was actually working. To do that I just quickly prototyped some code for lab 5. Also, I didn't know how to "properly" install Python packages. I still don't, but I did manage to hack it until it started working.

by Hilton (noreply@blogger.com) at November 27, 2007 10:17 PM

Chris Ellsworth

and to think i started this lab a few weeks ago

So it took me awhile (I was lacking motivation) to finally find a good akismet library for php. I couldn't figure out how to use the one on the akismet home page, but I found another that worked great and was easy to use.

http://www.achingbrain.net/stuff/akismet/

I liked the example he has on his website. That made it all the easier to use.

I followed in the footsteps of John with the pear library for sqs. I didn't have to make the fix either. And that library made submitting to the service almost trivial (the installation took some time--and I had to increase the memory in the php.ini from 8MB to 32MB).

by Chris (noreply@blogger.com) at November 27, 2007 05:55 AM

November 22, 2007

Jay Liu

Lab 4


I would say this is complete, but I have always been wrong in the past, so who knows whether I am actually done.  Chances are, I’m not.

As far as I know, I can call Akismet with no problem.  I kept on using PHP, and the PECL_HTTP library that took me ages to install in the last lab served me very well.  I was able to make http_post_data calls, and to my knowledge everything works….

…except for the fact that I don’t really know how to verify whether my XML makes it to the Processing Queue (http://queue.amazonaws.com/A3N3IV5XJH079S/processing).  Right now, I send a HTTP POST to that endpoint, specifying the content type as text/xml.  If there are some configuration issues, then I should be able to finish them quickly.

by cyanos at November 22, 2007 12:44 AM

November 20, 2007

Richard Duncan

Finished Lab3

The submit process was my greatest difficulty for this lab. I was clueless on how Python gets post variables and a bit confused on the submit script. After talking with Sam, I was able to complete it and was surprised how easy these were.

to get post variables (var1, var2, ...):

def index(req, var1="", var2="", . . . ):

Submitting was a bit more tricky, but I just had to use urlencode(data{}) from the urllib to encode the data dictionary of variables. Then use urllib.urlopen(url, urlencoded(data{})).
docs.python.org has a good example. What confused me was trying to use the http library that someone recommended for doing posts in python.

by Richard Duncan (noreply@blogger.com) at November 20, 2007 10:23 PM

Lab 4: Finished

This lab really was quite simple because doing all the previous labs gave me the necessary tools to do this one. I found two libraries for akismet. Use this one as the other always returns spam="true"; which drove me nuts.

GUID worked well.

by Richard Duncan (noreply@blogger.com) at November 20, 2007 08:58 PM

Mike Heath

CS 462 - Lab 4

I submitted my Lab 4 late because I spent all of last week at ApacheCon US 2007 in Atlanta, GA. ApacheCon was awesome, BTW.

Lab 4 was very straight forward. I did this lab with Python and the Akismet module for Python works very well. I used uuidgen to generate my UUID. uuidgen is a command line tool for generating UUIDs that comes with most Linux distros. Invoking it and getting the output back took some effort. I’m using the following code:

import subprocess

uuid = subprocess.Popen(["/usr/bin/uuidgen"], stdout=subprocess.PIPE).communicate()[0].strip()

Most of Python is nice and concise and it’s unfortunate that you have to do something so ugly to do something that’s so trivial in other scripting languages.

After playing around with Python boto, I was able to get the SQS code working.  I wasn’t able to test my implementation against other students’ lab 3 implementations but my code appears to be working just fine with my lab 3.

by Mike Heath at November 20, 2007 06:24 AM

November 15, 2007

John Dusbabek

Lab 4 Schema Clarifications

I came away from our lab 4 design session with a couple wrong impressions regarding the schema, and got them clarified by Sam yesterday. Here's a less ambiguous version of the message format we're supposed to use (would have updated it on the lab page but I'm not on campus at the moment).


<idea guid="">
  <initiated date="" technology="(wstechnology)">Name provided by user</initiated>
  <submitted date="" technology="(submit server tech)">RY name of submit server creator</submitted>
  <spam>true</spam>
  <domain>www.foo.com</domain>
  <body>Foo and gunk are better for this site than xs</body>
</idea>


You should notice that the wsuser that we POSTed from our submit script is getting thrown away. I assumed that since we went through the trouble of POSTing it, we'd definitely use it-- and I ended up putting the user provided name in the submitted element.

Looking back I don't think we got our schema right. I think we're trying to cram too much information into too few elements. Sure, it keeps us from having to add an additional element (<user> for example) to store the information; but the result is that we've lost some information-- wsuser (less important) and made it more confusing (more important).

by J Dusbabek (noreply@blogger.com) at November 15, 2007 04:06 PM

November 14, 2007

Scott Bong-Soo Chun

Lab 4 Postmortem

To start, I grabbed the Java libraries for Akismet and SQS services. They were so simple that the submit process work first shot. WOW! I took me another 2 hours to shore-up the error checking.

by ChunSB (noreply@blogger.com) at November 14, 2007 05:09 AM

Lab 3 Postmortem

UI Implementation
The templating engine I chose to use is Velocity. This is great because all I have to do is:

1) determine which objects will be visible to the velocity template and use it in teh html (actually *.vm) file to populate the dynamic data.
2) in the servlet, inject the Java objects I used in step (1) into the template context.

Getting the listings
Here, I used HttpClient to maje the "GET" calls.

Overall coding
This lab required a bit more programming than the previous labs. This was mainly because I wanted to add some smarts to the page navigation.

by ChunSB (noreply@blogger.com) at November 14, 2007 05:07 AM

Lab 2 Postmortem

Mapping the URLs
The most complex part of this lab was matching the URL signatures. I played with Tomcat for a while, but realized that using mod_rewrite with Apache would be much easier. Here are the steps:

1) mod_rewrite comes with Apache 2.2 so just uncomment the module in the httpd.conf file.
2)add the following lines to the end of httpd.conf:

# Send cs462 labs to worker named default
JkMount /cs462lab2/* default

This will send all URI's which start with "/cs462lab2" to tomcat.
3) add the file ".htaccess" to the default docs dir of Apache. This file contains the regex rules for rewriting the URL.

4) add the following lines:

RewriteEngine on
RewriteRule ^index$ cs462lab2/SiteLister?option=index [NC]
RewriteRule ^domain/(([a-z0-9]([-a-z0-9]*[a-z0-9])?\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))/?$ cs462lab2/SiteLister?option=domain&domain=$1 [NC,L]

That's it!

Calling HTTP Get
Because I'm using Java and Servlets, I have the luxury of choosing from many open-source libraries.
1) To call S3 for the data, I used HttpClient which is great for Http Get and Post calls.
2) To convert the JSON data to Java types, I used json-lib.
3) To format the XML, I used JDOM.

These libraries minimized the code I had to write.

by ChunSB (noreply@blogger.com) at November 14, 2007 04:57 AM

November 13, 2007

Chris Ellsworth

Which date format

Is there a specific date format that we are supposed to use in lab 4 in the xml?

Thanks

by Chris (noreply@blogger.com) at November 13, 2007 03:47 AM

November 10, 2007

John Dusbabek

SQS: Queue Length / Auth Signature

To get the queue length, as well as the visibility timeout, you make a request using the GetQueueAttributes action. The PHP library I'm using to make my calls to SQS doesn't support this call (must have been written before the 2007-05-01 release of SQS) so my options are to find a new library, or to write my own function to do this.

I decided to try writing my own first, and while researching this I found something I was looking for while doing lab 4. How to compute the authorization header, or Signature.

The process is as follows, you take the query parameters and concatenate them all end to end (key preceding value). Don't include the ?, &, or = signs. Then you calculate the HMAC-SHA1 signature of that string (using your secret access key). Then convert it to base64.

Here's the example Amazon gives on their site.

The following request:

?Action=CreateQueue
&QueueName=queue2
&AWSAccessKeyId=0A8BDF2G9KCB3ZNKFA82
&SignatureVersion=1
&Expires=2007-01-12T12:00:00Z
&Version=2006-04-01


translates into the following string:

ActionCreateQueueAWSAccessKeyId0A8BDF2G9KCB3ZNKFA82Expires2007-01-12T12:00:00ZQueueNamequeue2SignatureVersion1Version2006-04-01

which when hashed with the secret key (fake-secret-key, used in this example) yields:

wlv84EOcHQk800Yq6QHgX4AdJfk=
(URL encoded version: wlv84EOcHQk800Yq6QHgX4AdJfk%3D)


I looked at my PHP library, and sure enough here are the methods that create the signature. They require the PEAR Crypt_HMAC package.


function hex2b64($str) {
  $raw = '';
  for ($i=0; $i < strlen($str); $i+=2) {
    $raw .= chr(hexdec(substr($str, $i, 2)));
  }
  return base64_encode($raw);
}

function constructSig($str) {
  $hasher =& new Crypt_HMAC($this->secretKey, "sha1");
  $signature = $this->hex2b64($hasher->hash($str));
  return($signature);
}

by J Dusbabek (noreply@blogger.com) at November 10, 2007 10:46 PM

Lab 5 : Approval Client

I'm just about finished with my approval client, the things I need to do are:

A) figure out how to get a count of the number of items in the queue (my library doesn't support that function)
B) Make my SOAP calls to the WHOIS service (is there a specific service we're supposed to be using for this?)

I've got my prototype running here. I'm using Flex for the front end, with data provided by a PHP backend. My plug for Flex follows...

Based on anecdotal evidence (i.e., conversations I've had) I think that Adobe Flex is one of the most misconstrued technologies in our department. I just wanted to take a few lines and address some of the misconceptions I've heard as I've discussed Flex with fellow students.

Things you've probably heard about Adobe Flex:
1. It's proprietary.
2. It's uses Flash.
3. It costs hundreds of dollars for the compiler/IDE.
4. Flex data services costs several thousand dollars per processor to deploy.

While there is some truth to all the statements above, but bottom line in regards to cost is this: Flex is free.

You can download the free SDK (incidentally, all you need to compile and deploy Flex applications) here

If you'd like an IDE, go here to download a free academically licensed version of Flex Builder 2, with Charting. You'll need to provide some identification.

Follow links here to download Flex Data Services Express (licensed free of charge on up to 1 processor). I should mention that I've used data in all the Flex applications I've developed, and I have yet to try Flex Data Services. There is a wide variety of options for getting data to your application. I've used Java web services, PHP pages returning XML, AMF (Actionscript's serialized data format) streams, and a couple others.

And yes, it does run in a Flash player (the resemblance to Flash ends there) but that does have its advantages. As long as your platform has a Flash player, your application will look and function exactly the same on Linux, a Mac, a Windows PC, or whatever. And as far as RIA's go, that's saying something.

All things accounted for, it's cheaper to develop in Flex than it is to develop in AJAX. And the applications look better consistently across platforms. So if the "cost and Flash" are the only things holding you back from checking it out, you really ought to look into it.

by J Dusbabek (noreply@blogger.com) at November 10, 2007 07:20 PM

Hilton Campbell

Lab 3

With labs 1 and 2 under my belt, lab 3 was a walk in the park. I finished it early, but then with the changes in requirements I had to revisit it to finish up. This has been a very busy semester for me, so I've had to find ways to spend as little time on these assignments as possible, while still meeting all the requirements. The way I've chosen to do this is to only meet the requirements, doing whatever it takes to get the job done. This approach has worked well for me so far.

by Hilton (noreply@blogger.com) at November 10, 2007 03:38 PM

Richard Duncan

Still Working on Lab3

Well, I'm getting there. I pretty much have the index and domain pages displaying properly now. The trick was finding out how to use XML in python. I tried Amara, but the installation failed. So, I decided to try ElementTree. I didn't have to install anything to use it. It was a bit tricky to understand how to use it at first, but it really isn't that bad.

Here's an example:

from elementtree import ElementTree
tree = ElementTree.XML('some xml string')

# find all nodes with tag
for node in tree.getiterator('domain'):
# get attribute ideacount's value
ideacount = node.get('ideacount')

# get text between the tags
domain = node.text

# find child node of foo where tag =
foo = node.find('submit')
tech = foo.get('technology') #get technology attribute

Now I just have to work on the submit process.

by Richard Duncan (noreply@blogger.com) at November 10, 2007 01:53 AM

November 09, 2007

John Dusbabek

Lab 4 - Submit Server

Once again, I got my best advice at the end of the lab. I'll share it with you-- use a library when trying to interact with the SQS.

The first approach I tried that failed was using cURL to send the PUT request. You can see the code I used for that in my previous POST. Although it works in general, I wasn't having much success using it with SQS. Perusing the SQS documentation a little more closely revealed that I was not sending the correct headers. Here's a link to it.

I didn't find that link until I had given up on cURL and switched to PEAR's HTTP_Request package. As far as general purpose HTTP request packages go, the interface is much easier to use. I added most of the headers, but was having trouble formatted the Authorization header correctly. That's when I got the advice to use a library.

I checked out a couple SQS libraries for PHP, the one I ended up going with was one I found on Amazon's site. The documentation is wanting, but I was able to figure out what to do. It makes use of the PEAR extensions. I didn't have to make the changes to the PEAR code like it suggests on the site (I don't know if it's because it was correct, or because I wasn't exercising the faulty code).

Here's the code I used to make the request:

function submitToQueue($xml) {
  $q = new sqs('ACCESS-KEY', 'SECRET-KEY', 'http://queue.amazonaws.com/');
  $queueId="A3N3IV5XJH079S/processing";
  $q->putMessage($xml, $queueId, 1000);
}


And that was it.

by J Dusbabek (noreply@blogger.com) at November 09, 2007 03:21 AM

November 06, 2007

Jay Liu

Lab 3 Finally Finished


Well, it’s done.

I had a surprisingly large amount of trouble parsing XML.  Then I looked a bit more closely at the examples on the PHP docs website, and was able to crank through with it.

I suppose for the latter half of this lab, I really had to focus on “Working Smart” instead of just “Working Hard.” It just seemed like a bunch of mundane tasks (like getting POST to work, or doing an HTTP_POST, or getting XML to parse) were taking up loads of time.

by cyanos at November 06, 2007 05:00 PM

November 03, 2007

John Dusbabek

GET / POST / PUT Using PHP

The simplest way by far to do a GET in PHP (if you just want the return contents and don't care about the headers) is to use the file_get_contents. It's useful for getting the contents of a file quickly, or the contents of a web page. For example this method, from lab 2, retrieves data from S3.


function queryS3($path) {
  $contents = file_get_contents('http://s3.amazonaws.com/cs462-data/' . $path);

  return $contents;
}



Doing an HTTP POST, I use the cURL library. If you've never used curl before, there's a slight learning curve for doing a POST. I was able to figure it out after looking at a couple samples (the PHP documentation isn't much help). The key is creating an associative array from your POST data fields.

Here's an example from Lab 3, where my submit script (which is actually a Submit object) finally submits the idea to the submit server.


function sendToSubmitAppServer() {
  $url = "http://sslb-p.webappwishlist.com:8080/submit";
  $useragent="Johns Web Service Server, version 7";

  $data = array();
  $data['domain'] = $this->domain;
  $data['name'] = $this->name;
  $data['idea'] = $this->idea;

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
  curl_setopt($ch, CURLOPT_URL,$url);
  curl_setopt($ch, CURLOPT_POST, 1);
  curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
  $result = curl_exec ($ch);
  curl_close ($ch);
}


Doing HTTP PUT requests was a little more tricky, because I could send PHP doesn't offer a straightforward way of specifying the body of a PUT request, specifically where the body comes from a local string variable. I found a solution here (it basically says all the stuff I just said, with a code example. Here's my code example from lab 4, where I'm sending the XML data file to SQS. I can't be sure that it works, as I don't know how to test if something was sent to the queue. I'll update this example if I find any errors during the next couple days.


function submitToQueue($xml) {
  $url = "http://queue.amazonaws.com/A3N3IV5XJH079S/processing";
  $useragent="Distributed Systems/v3.4 (compatible; Mozilla 7.0; MSIE 8.5; http://classes.eclab.byu.edu/462/)";

  $fh = fopen('php://memory', 'rw');
  fwrite($fh, $xml);
  rewind($fh);

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
  curl_setopt($ch, CURLOPT_INFILE, $fh);
  curl_setopt($ch, CURLOPT_INFILESIZE, strlen($xml));
  curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  curl_setopt($ch, CURLOPT_PUT, 1);
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

  $result = curl_exec($ch);
  curl_close($ch);
  fclose($fh);
}


It's opening a memory stream like it would a file stream, and then uses the CURLOPT_INFILE to specify the data to be sent in the body. I ran into a situation like this before (not while using PUT) which I solved by actually writing data to a file and then passing the file back in. Talk about a hack...

by J Dusbabek (noreply@blogger.com) at November 03, 2007 04:14 PM

Jay Liu

Lab 3: Meinen Kampf (mit Linux)


Wow, it was quite the trip trying to get http_get() and http_post() to work in php. Maybe I should have stuck with Python!! :P

The following sites really helped me configure everything.

http://www.art122-5.net/index.php/How_to_install_PECL_HTTP

http://www.jellyandcustard.com/2006/01/19/installing-pecl-modules/

http://www.linuxforums.org/forum/redhat-fedora-linux-help/58407-yum-install-mythtv-suite-problems.html

http://www.linuxforums.org/forum/linux-applications/41118-how-install-gcc.html

Of course, I’m a Linux Newbie, and so pretty much any Linux site helps me :P

 

Had to roll back from 5.2.4 to 5.0.4.. I got the PECL_HTTP module on there (http.so), and was finally able to get those dang http_get() and http_post() methods working. Before, there was fatal interpreter error saying it didn’t know what those methods were. (Postmortem comment: crap, I should have just asked John about getting and posting via http… argh.. oh well, this here is one way to do it, I suppose.)

 

Hurrah for Amazon. Wow, having multiple instances of the server versions really helped, though. I would have wasted even more time uninstalling everything to roll back the php version.

Kudos to CodeIgniter. The framework had some nice functionality, that allowed me to do some url rewriting, form validation, templating, and configuration stuff (i.e. setting global variables like URLs).  It was just a light framework that I already knew the basics of.  I’ll have to look more into Smarty though. I have seen many people use it and they have said good things.

Dumped a bunch of time into this lab. Was it worth it? Well, I suppose it’s one of those “I’ll find that out when I’m older”-type of things.

Finally got something to show up in Twitter. Yes, I can proudly declare with the rest of the classmates that have finished Lab 3 that the posting came straight from the application that I wrote…

Still not done, but very close.

by cyanos at November 03, 2007 04:11 AM

November 02, 2007

John Dusbabek

Updating PHP on Fedora Core 4

Using Amazon's standard EC2 images as the base image to your servers probably means you're going to be running Fedora Core 4, with outdated versions of PHP and MySQL. In my case PHP 5.0.4-- and I've run into several problems with this, as I've wanted to use some of PHP's advanced functionality. Functionality that is either not included by default with PHP 5.0, or not included at all.

An example of functionality not included by default would be JSON, and example of functionality not included at all would be memory based streams. Both of which have had or are having applications to this lab. I finally decided I needed to have PHP 5.2 installed on my servers. The only problem is that none of the repositories configured by default have PHP 5.2 packages for Fedora 4 (this is what has hindered me in the past).

I finally solved my problem after stumbling upon this site: remi.collet.free.fr (I knew my 3 years of French education in high school would pay off some day) that has a repository with lots of packages including update packages for MySQL and PHP for FC4.

I'll outline the steps I followed to update PHP 5, and include the commands (which happen to be included on the above site, here although it may take a few minutes to find them if you don't know French.

1. I downloaded (using wget) the repository configuration file to the repository configuration directory.

cd /etc/yum.repos.d/
wget http://remi.collet.free.fr/rpms/remi-fedora.repo

2. I made a yum call, enabling the repository in the process (as it's disabled by default).

yum --enablerepo=remi install php-5.2.4

3. I restarted Apache.

apachectl restart

And that's it. I now have PHP 5.2.4 running on Fedora 4. I'm sure that an expert could have accomplished it some other way, but as a relative n00b, I have to admit I quite rely on yum.

by J Dusbabek (noreply@blogger.com) at November 02, 2007 05:18 PM

Lab 3 - List App/Web Server Integration

I got off to an early start on lab 3, and had most of my code working in a couple hours. The thing that hung me up for two weeks was trying to figure out how the step "Register with the SSLB" fit into this lab. I finally asked Sam about it and he told me it was a mistake. Looking back I realize that I deserved to wallow around in confusion for not having asked the question sooner.

I used PHP again, I think I've pretty much given up on Python this semester... there's always next semester. The template engine I've been using, Smarty, is pretty powerful. Not that I need to exercise its full power for this project, I really like it though. I actually like it so much that I've switched to it from XSLT on another project I'm working on.

One of the other challenges I had during this lab was figuring out how to do the URL rewrites, as the keyword /submit had to go to the submit page, and /everything-else had to go to the idea list for the domain 'everything-else'. I'm sure there are Perl gurus out there who could have whipped up a regex in 5 seconds to handle that... unfortunately I'm not one of them. What I did was rewrite all URLs to go to a driver script that parsed the original URL, and then used PHP objects to generate the appropriate response in each case. These PHP objects were converted from the scripts I originally planned to redirect to for each action.

Testing this lab also turned up a bug in my listing app server. It was a subtle bug that manifest itself by returning only 1 idea for a given domain (even in cases where there were more than 1). Once I identified it, I feared the worse. It took me 30 minutes to track down the source, which turned out to be a missing '$' on my loop variable (which was used to address an array). I won the book in class for the PHP quiz (for identifying the MySQL wasn't enabled by default in a PHP installation)... but I don't know enough to understand why $myarray[i] (should have been $myarray[$i]) didn't cause a more visible error. I'll have to check my error reporting settings in my php.ini file.

I have to say I've been having a great experience in this class, overall. The greater emphasis on architecture and design, and lesser emphasis on KLOCs has been an effective approach. I can think of 2 other CS classes (off the top of my head) I have taken that could benefit from this model.

by J Dusbabek (noreply@blogger.com) at November 02, 2007 04:47 AM

October 30, 2007

Chris Ellsworth

Lab 3....Subtle Changes tripped me up

So the first thing that messed me up was that Apache wasn't passing Post arguments through standard in to cgi. I tried to get them in a python script, and a bash script. So for that part I started using mod_php which got its variables perfectly.

Another subtle change that messed me up was that the there was no server loaded by default that could serve lab2's content. And when I launched my lab2 instance it looked like the port that the registering used changed (I updated the wiki so it would show the right port).

So, I made the post request through php, and that was kind of ugly. I had to parse the response to see if there was an error etc. If I had to do it again, I probably was use a language that has an http connection class built in that parses it all for you.

Man I also really like JSON after handling XML. It is simple and really easy to use with python. There is a lot you have to do to use XML. When I can, I would choose JSON.

by Chris (noreply@blogger.com) at October 30, 2007 08:32 PM

Jay Liu

Lab 3 - more


Because I want to get this cranked out as soon as I can, I’m afraid that I’ll have to revert, regress, defect, apostatize back to PHP for the implementation. I know of a framework that I can use (CodeIgniter) that will support templating. I already know how to make it so that I can create forms and post, and that sort of thing. I’m just kinda sick of messing with Python and trying to get it to jump through hoops right now. I know that PHP will be able to do the current task just as well. Python, Django, I’ll be back for you… later.

by cyanos at October 30, 2007 06:30 PM

October 26, 2007

Mike Heath

CS 462 Lab 3 and Stuff

Lab 3 went pretty well.  It took longer than I thought it would but they always do.  One of the things I wanted to do was write parse the XML using SAX since I’ve never actually written a SAX parser before.  SAX takes more work than using DOM and XPath but I got the experience I was looking for.

Parsing the date in the idea data took some time to get right.  I’m using Java and the Java Date API frankly sucks.  Fortunately there’s Joda Time.  Using Joda Time, I initially thought the date was in ISO 8601 format.  It’s not, but luckily putting together a custom date formatter is very easy with Joda Time.

The biggest problem I had with this lab is that when I deployed my application to to ec2, the listing service that was registered with the load balancer was spitting out XML that didn’t follow the format defined for lab2.  The XML I was getting was putting the ‘user’ and ‘technology’ attributes in the ‘request’ element and not in the ’server’ element.  I made a few small modifications to my code to display an error message when unable to properly fetch the ’server’ and ‘user’ data and everything started working just fine.

For my template framework I was using Apache Velocity but I got fed up with the poor error messages produced by Velocity.  So I switched to Freemarker and I haven’t looked back.  Freemarker has a much cleaner API than Velocity, the Freemarker template syntax has more features, and Freemarker produces error message that are actually useful.

by Mike Heath at October 26, 2007 05:47 AM

October 24, 2007

Jay Liu

Lab 3 Hindrances


I’m having a bit of trouble getting the POST data (or anything from the request object) in my lab. I tried running this code in my program and in a separate file, but all I get is a segmentation fault (didn’t even know that was possible in Python). I’m pretty sure that the seg fault happens in the loop.

from cgi import escape

from urllib import unquote# The Publisher passes the Request object to the function

def index(req):

   s = “”"\

<html><head>

<style type=”text/css”>

td {padding:0.2em 0.5em;border:1px solid black;}

table {border-collapse:collapse;}

</style>

</head><body>

<table cellspacing=”0″ cellpadding=”0″>%s</table>

</body></html>

“”"

   attribs = 

# Loop over the Request object attributes

for attrib in dir(req):

      attribs += <tr><td>%s</td><td>%s</td></tr>

attribs %= (attrib, escape(unquote(str(req.__getattribute__(attrib)))))

return s % (attribs)

 

by cyanos at October 24, 2007 02:50 PM

October 22, 2007

Jay Liu

Beginning Lab 3


Understanding Check:

If I am correct, for this lab we are supposed to basically extend lab1 with the functionalities listed in the lab spec.  I can see the integration process of this lab will be rather interesting.  I realize that I don’t even have a clear understanding of what we are trying to achieve with our web application.  I can’t remember what was said in class.  I can see the individual tasks that the components that we build (from lab to lab) are trying to accomplish, but I don’t understand what the user wants to be able to get out of our application.  How does an “idea” relate to a domain and name, exactly?  I suppose I can get this brief clarification during class today.

 

Issues I have Identified pertaining to this Lab:

Xml Parsing:

I thought about including some library that would support Xpaths, but I think I’m just going to go about parsing the XML the more clunky way of referring to element indices and names.  I assume that the XML to be used as input will be the output from the server we all built in lab 2.

Getting the POST variables:

Well, apparently you can’t use the cgi.FieldStorage() method when using mod_python.publisher.  I’ll have to find some other way to get around that.

by cyanos at October 22, 2007 04:52 PM

October 15, 2007

Jay Liu

Lab 2 Completed, I think!


Getting the directory listing

The most recent thing that I figured out (with the help of classmates’ posts and Google) was how to modify the content-type header. Everyone knows that if using mod_python.publisher, one must do the following

   def index(req):
     req.content_type = 'xml/html'
     return "<xml />...."

the /domain/site part of the page

I