Jerry Smith's CIS 625 Web Design Journal

Resources for Making Your Website Work For You

Abstracted Article Archive 

This page contains an archive of articles that have been abstracted on this site. The articles are ordered by date posted, with the latest posted articles appearing first. Alternatively, you can jump to entries for a specific week by clicking the links below:

Entries for Week 6: April 13, 2009 through April 19, 2009

Password Attack Discussion & Benchmarks

Alan Amesbury16 of the University of Minnesota's Office of Information Technology provides an excellent write-up regarding passwords and how both the number of possible characters, the length of the password, and the hashing algorithm can really effect how long it takes to crack a password:

  • The easiest thing to do in order to make a password encoded with any algorithm harder to crack is increase the number of possible characters used in the password. To illustrate this, Amesbury assumes a standard length of 7 characters, but instead makes the password case sensitive, meaning that one password has 26 possible characters while the other has 52. The resulting difference is about 4 orders of magnitude more combinations!
  • The best password policy involves all letters (case-sensitive), all digits, all symbols (i.e. shift+digits), and a space. This provides 69 total characters. A password with 7 characters and 69 possible characters provides over 7 trillion combinations.
  • Increasing the size of the password by just one character (all other things constant from above) increases the combinations to over 513 trillion.
  • A good hashing algorithm is purposefully CPU intensive, such that hashing a password 1 time is no big deal but trying to hash a bunch (as in a brute force attack) is too slow to be practical.
  • Using a 3.2Ghz Xenon, Amesbury found that it would take 95 years to brute force through all possible passwords hashed with Microsoft's NTLM given a variable character length between 1-8 characters with 69 possible characters. FreeBSDs MD5 hash given the same parameters would take more than 11,000 years!
  • Given the data he presents, Amesbury suggests that passwords be variable length of at least 6 characters, contain mixed case letters, and at least a symbol, and no part of the password should be anything found in a dictionary. Dictionary attacks on even FreeBSDs hash can succeed in under 15 days.

Biggest Mistakes in Web Design 1995-2015

Everyone's favorite cranky web design consultant Vincent Flanders17  has compiled a list of the most common things that bad webmasters do. His best suggestions are:

  • Your site is only important to potential surfers if it does something useful for them.
  • If visitors to your site can't figure out the purpose of your site within four seconds, the site is not doing its job.
  • Design should not get in the way of the purpose of the site. Even if it is pretty, if a design keeps visitors from what they came to the site for, scrap it.
  • Don't put too much stuff on one page, and certainly don't put too much different types of stuff on a page.
  • Don't think your visitors are going to care too much about web standards. While adhering to standards is good, your visitors only stick around if the site is useful to them.
  • Be careful with the use of images, Flash, and Javascript. Only use these elements if they add actual benefits to users.
  • There is nothing wrong with making your site look and behave like other successful sites.  Being totally different with navigation or design such that your site looks nothing like any other site will probably confuse many of your visitors.

Practical Tips for Government Web Sites (And Everyone Else!) To Improve Their Findability in Search

Vanessa Fox18 at O'Reilly Radar believes that government websites will only be useful if their contents can be easily found with search engines. She says these recommendations are vital for government sites, but also important for non-government sites too:

  • Sites should create well-formed XML sitemaps.  If the site is structured well enough so that a sitemap can be created to explain its contents, the its contents are most likely logically organized.
  • While the sitemaps do not have total control over what search engines do, they help steer the search engine toward what is most important.
  • Make sure any public content is accessible without requiring registration or specific user input. Search engines (and oftentimes users) will abandon their quest for information if some sort of input gets in their way.
  • If file names and locations change, make sure to serve up a 301 Resource Permanently Moved redirection.
  • Make sure to include ALT text with images.
  • Make page titles unique and ensure they are titled such that the title actually describes the page.
  • Make sure pages are functional and informational even if Javascript and images are turned off. This helps search engines (as well as users) see important information even if these features are turned off or unavailable.

 

Entries for Week 5: April 6, 2009 through April 12, 2009

Memo: Avoid Nested Queries in MySQL at All Costs

Scott Selikoff13 of Down Home Country Coding sheds some light on an issue that a lot of developers know to avoid but do not fully understand: Nested queries and the fact that they are bad. It turns out, there are some good reasons why developers use them, but the bad far outweighs the good:

  • The reason nested queries are so widely used make sense. Nested queries work in a sequential order that fall in well with the way people think. It basically creates a list using some criteria that is narrowed down by further criteria, that again narrowed down by further criteria, ad infinitum. This step-by-step approach will get you to the answer, but at the cost of memory and CPU cycles. The process requires multiple passes (i.e. loops), which are bad if there is a lot of simultaneous requests.
  • Joins combined with aliases can skip the grinding loop of shaving down a list of returned rows and get to the data the developer really wants in an efficient way.  The trade-off is that the developer has to keep track of the aliases and understand fully how joins work.
  • If executed properly, joins basically shorten the list as the rows are retrieved, rather than generating multiple lists and trimming each one criteria at a time.
  • The vast majority of the time, according to Selikoff, nested queries are not necessary and the same work can be achieved more efficiently with joins.
  • In the event a nested query is absolutely necessary, Selikoff recommends making multiple database calls and using the programming language to construct queries that take the output of each call and use it to continue to the next step. This sounds counter-intuitive, but Selikoff says in his experience, this method scales better and is less susceptible to slow-downs in the event the data composition of the table changes.

While the author's advice won't work 100 percent of the time, it does offer developers some thing to think about when they are trying to optimize a MySQL-based application.

Alternatives to LAMP

While Linux, Apache, Mysql, and PHP (LAMP) work well as a web development combination, that does not mean that there aren't alternatives that are better suited for specific purposes. David Chisnall14 at Informit has some good recommendations for alternatives that may be useful for any developer to be aware of from time to time:

  •  The alternatives to Linux are pretty well known, but the advantages of each may not be:
    • Solaris is great for multi-threaded applications and can squeeze a ton of performance out of multiprocessor machines.
    • OpenBSD is super-secure, thanks to its development process and runs really well on old hardware. It is not optimized for multithread or multiprocessor configurations though.
    • FreeBSD is great for virtualization because of its "jail" architecture, that allows a high level of privileges for each user without the worry of the entire system being affected.
    • Finally, NetBSD can run on virtually any hardware and is really fast. It's not as fully featured, but is the best for speed.
  • The well-recommended Apache alternatives are:
    • LighTPD, which is really fast for static content (up to 200% faster than Apache in some cases). With FastCGI, Chisnall claims that PHP can be served with LighTPD as well as Apache.
    • Yaws, which is heavily optimized for concurrency and parallelization.
    • Tux, which is very fast but if it crashes, it can kill the whole server.
  • MySql's alternatives are:
    • PostgreSQL is very stable, protects data, and is fully featured. It can sometimes be much faster than MySQL when doing complex queries.
    • SQLite is very stripped down and each database is a single file on the server and all data is treated as a string. It's really fast, though, and good for applications that only require one process accessing it at any given time.
  • PHP alternatives are:
    • Perl is an oldie, but it can still be very useful for string-intensive web apps.
    • Java, in the form of JSP and other frameworks such as WebObjects. These frameworks are highly flexible and benefit from the powerful APIs built with Java (as well as servers like Tomcat).
    • CGI and FastCGI can harness the power compiled languages like C to be lightning fast. The downside is the procedural nature of C and the need to recompile often.

Tuning Apache and PHP for Speed on Unix

John Lim15 writing for PHP Everywhere provides some good tips for tuning Apache and PHP for performance. Some of these tips are common sense, but some are sure to

  • Make sure to benchmark. This can be time-consuming, however, it is the only way to truly judge what changes are having a positive, worthwhile impact.  He recommends ApacheBench or Microsoft's Web Application Stress Tool.
  • According to Lim, PHP scripts running in Apache are 2-10 times slower than static HTML.  Therefore, try to use static pages whenever possible.
  • If CPU cycles are not a premium, try enabling compression of your HTML in PHP. It will speed up download times for the users considerably. Faster downloading = happier users.
  • In PHP, pass array and objects by reference. This can conserve a considerable amount of RAM.
  • Run each service on a separate machine whenever possible (i.e. web server on one box, database on another). Lim goes on further to say that it may be worthwhile in some situations to use different server software on different boxes for different types of content.
  • Beyond these tips, Lim provides links to a myriad of specific tuning tips for many types of environments and setups.

 

Entries for Week 4: March 30, 2009 through April 5, 2009

20 Ways to Secure your Apache Configuration

Web developer Pete Freitag10 offers a nice list of suggestions for making an Apache server more secure. While this list is by no means exhaustive, it is a good starting point for some very basic things server admins can do to make their boxes are not compromised. Some of the best ideas include:

  • Use the ServerSignature Off and ServerTokens Prod to keep Apache from displaying too much in its headers or error pages. This is an example of security through obscurity, but its better than telling all to anyone who wants to know.

  • Make sure Apache is running in its own exclusive user and group. If not, an attack on a service running the same user could exploit Apache too! Also, make sure root owns the rights to Apache's config and binary files.

  • Turn off all features you do not need, including directory browsing, CGI, and SSI. All features add a level of exploit, and if you're not using it, you're incurring the liability for nothing.

  • Disable support for directory-level .htaccess files by issuing AllowOverride None in your httpd.conf file.

  • Set restrictive limits on anything you can safely limit, including the maximum request, timeout, concurrent connections, and IP addresses allowed to access certain resources. Granted that some or all of these may not be able to be limited depending on the situation, but the more limited, the better in security terms.

A HOWTO on Optimizing PHP with Tips and Methodologies

Web developer and systems administrator John Lim11 of PhpLens provides some excellent information in regards to optimization into PHP. Not only does he provide concrete things that can be done to optimize scripts, but he also provides a good information about the tradeoff between scalability and speed:

  • If RAM isn't an issue, scripts can be tuned for more speed (in terms of CPU seconds). PHP runs multiple copies of the same script for each request.. Oftentimes, the same job can be done with fewer executions (minimizing CPU time needed), but the tradeoff is that more RAM is needed to store the data processed. Scripts that are memory efficient often use more CPU cycles. More individual executions are made and data is handled in smaller chunks, resulting in less need for memory. Lim provides a nice graph and example of two scripts performing the same task to illustrate this point.
  •  Optimize for output file sizes, since PHP can only push as much data back to the browser as is the size of the network connection.
  • Watch the shared memory. Too little spread amongst multiple copies running brings PHP scripts to a crawl.
  • Avoid hard disk reads as much as possible. If RAM is available, consider creating RAM disk caches for flat-file data that is read frequently.
  • Optimize code up front and consider scalability, flexibility, and speed. Decide what tradeoffs the project will tolerate, as Lim says you can't achieve 100% in all three areas. Optimizing after the fact takes longer than doing it right the first time.
  • Use a PHP optimizer, such as Zend Optimizer. According to the data Lim provides, these optimizers almost always crank up performance on servers that recieve moderate traffic.
  • Benchmark functions, both built-in and custom written. This is fairly simple, requiring only a few lines involving microtime calculations. Also, the ApacheBench tool is a handy way to stress test without having to have real, live traffic.

Boosting Apache Performance by using Reverse Proxies

René Pfeiffer12 of the Linux Gazette provides a good explanation of what a reverse proxy is as well as some technical information that is invaluable to those wishing to make sure their Apache server is running without too much overhead:

  • A reverse proxy is a cache for the server that serves up data that hasn't changed to the client (browser, web service, etc.). This leaves the actual web server free to process information that is not static, or is static but has changed recently. The reverse proxy talks to the web server for most requests, however, the actual amount of data sent between the two for each request (in most cases) is considerably less than the amount of data the web server would have to send to the clients.
  • Apache is slow for serving static content (images, static HTML/CSS etc.), and since that is what reverse proxies are best at, Apache really gets a boost from a well-configured reverse proxy.
  • Apache can be configured to generate request headers (information about files being served that the  end user doesn't want or need to see) automatically for certain content types. If a resource is known to be static (i.e. jpg banner), the header can be configured so that the receiving client (i.e. a web browser or reverse proxy) knows that the resource can be cached for a certain period past its initially read modification date. If a reverse proxy is the receiver, it happily (and efficiently) serves the static content, only bothering Apache every so often to ask if the resource has been modified.
  • Squid is an excellent reverse proxy. Pfeiffer provides graphs that suggest that Squid considerably reduces the amount of load on Apache in a production server handling 120 requests per second by nearly 50 percent!

Entries for Week 3: March 23, 2009 through March 29, 2009

15 Essential Checks Before Launching Your Website

Smashing Magazine's Lee Munroe7 provides a great checklist for anyone launching a website to make sure it's totally ready for visitors. While some of the tips are commonly known, others are important but often overlooked. This is a list that even seasoned web designers should consult to make sure they haven't overlooked the small stuff that can make the difference:

  • Include a favicon, which is a very easy way to get branding that will actually stick if a surfer bookmarks your page.

  • Make sure to proofread. Visitors will appreciate it and search engines will pick up on the properly-spelled keywords.

  • Make sure the site maintains some functionality even if things like Flash and Javascipt are turned off. The functionality doesn't have to be 1 to 1, however, make sure that the limited visitors know they are missing out on something without depriving them of at least a basic implementation of the functionality.

  • Create an XML sitemap, with a good structure. Search engines, as well as some live humans, can use it in case they're not entirely sure where to go for the content they want.

  • Make friendly error pages, that provide links that may be able to get users back on track (this may involve pointing users to the sitemap mentioned above).

Which Are More Legible: Serif or Sans Serif Typefaces?

If you've ever wondered if serif typefaces are more readable than their sans counterparts, web developer Alex Poole8 offers a fantastic literature review spanning over 100 years of research on the subject. This information can be very useful to web designers when deciding what typeface to choose (or at least help to end holy wars about the topic). Here are his findings:

  • After it's all said it done, it appears that the majority of typeface readability studies find that the differences are so miniscule that the typeface alone isn't enough of a factor to matter.

  • Some studies have found that serifs increase readability because the extra marks provide extra space between letters.

  • The claimed readability of serif typefaces may be a result of familiarity with a typeface moreso than the readability of the typeface itself.

  • Some studies have found that sans serif fonts work better on computer screens because there is less detail (serifs) that has to be rendered on the monitor. However, this research took place when fonts were bitmapped and resolutions were much lower.

  • Poole stresses that many other factors aside from typeface, including type sizes, background, and font color, when taken in aggregate appear to have more influence on readability.

HTML 5 differences from HTML 4

Whether web designer's like it or not, HTML 4 must eventually give way to HTML 5. The good news is, according to Anne van Kesteren9 of W3C's HTML Working Group, browsers supporting HTML 4 should be able to handle pages written with HTML 5's features (although those browsers will not benefit from the new features of HTML 5). Here's some of the more interesting information regarding the differences between the two versions (although the document itself provides many more details than are listed here):

  • HTML 5 places a higher priority on accessibility with some new attributes. The article named hidden and progress as two of these, but did not go into detail.

  • HTML 5 will be more flexible with differing media types, with more attributes that deal with specific media types.

  • Elements that are specific to certain semantic structures like menu, datagrid, and the command elements.

  • One thing that disturbs me is that HTML 5 will support two syntax types: A traditional custom SGML markup that looks very similar to current HTML and a pure XML format complete with all the advantages and headaches of XML.

  • All purely presentational elements (center, font, strike, etc) have been removed (although they are probably supposed to be supported for the backward compatibility.

  • The DOM has been extended and is the overarching guide for how HTML 5 is being constructed.

HTML 5 is still in heavy revision, but it will be here sooner or later, so it is a good idea to know what to expect.

 

Entries for Week 2: March 9, 2009 through March 15, 2009

55 SEO Tips Even Your Mother Would Love

Richard Burckhardt4 of Search Engine Journal provides a great list of 55 things you can do to make sure search engines regard your pages with the highest priority. Many of the tips (such as make sure your web page titles are descriptive) are widely known, however, there are a few that are not so commonly preached that may actually make a big difference:

  • Getting too hung up on PageRank is short-sighted, since so many other factors matter, depending on the context of a search.

  • Don't split your backlinks between equivalent canonical names (i.e. www.domain.com and domain.com). Pick one style and be consistent with it, or use 301 permanent redirects if consistence is not possible.

  • Put plenty of descriptive text around links. This can be as important as the link text itself.

  • Give visitors a strong call to action, inspiring them to buy or use whatever got them to your page.

  • Use the word “image” or “photo” in your image ALT text, since a lot of searches feature keywords with one of these two words after it.

Obviously, there is a lot more to SEO than just the tips listed above, but these are some that stood out from the myriad of similar lists out there.

An Open Secret

Marshall Krantz5 of CFO Magazine believes that some Open Source Software is ready to run certain portions of an enterprise while its not so ready to run others:

  • It's best to deploy OSS first in a sector of the organization's business that is not necessarily its core competency. Krantz cites InterContinental Hotel group's use of SugarCRM, an open-source customer relationship management package. This is obviously an important area for InterContinental, however, it is not mission-critical.

  • If an organization has a commercial package that works well, don't go open source just to switch to something new. InterContinental sticks with it's old IBM mainframe to handle its booking activities. The system is a little expensive, but it is rock-solid and is backed by IBM's world-class support.

  • One of the least scrutinized areas of OSS is that some OSS license terms make OSS not free for commercial applications. Furthermore, some license are structured such that if the application ends up containing non-free software, the organizations using the software become liable for the infringement costs just as much as the company who supplied it. A way to mitigate this, according to Krantz, is to make sure that any OSS used by an organization is OSI certified, which should ensure that the software is totally free to use for whatever purpose.

Cloud Computing Survey: IT Leaders See Big Promise, Have Big Security Questions

Laurianne McLaughlin6 of CIO.com provides insight as to what 173 high-ranking IT leaders in the US think about cloud computing by giving a rundown of a survey about the use of the technology. The consensus seems to be that cloud computing shows big promise but is too unpredictable to use in mission-critical situations.

  • The flexible and cost-effective nature of cloud computing is the most appealing factor amongst the IT professionals surveyed. Cloud architecture allows companies to pay for what they need and scale on demand (both up and down).

  • A major downside is that cloud architecture is evolving too quickly and isn't mature. This makes investing a lot of resources in long-term cloud development risky because the whole nature of how a particular cloud implementation could change far more quickly than is possible to optimize for.

  • The biggest hurdle is that IT leaders do not regard the cloud as a secure place for data.

  • In the next several years, 53% of the survey respondents say that cloud architecture will change the way many enterprises do things, because of the power and scalability it offers.

  • Most respondents feel that the cloud is the key to rolling out successful Software as a Service (SaaS) implementations, but only after cloud architecture mature and standardizes.

  • Significantly, 42% say they would like to use cloud architecture in some way by 2012 to power ERP applications, since the cloud has the power and scalability to handle the amounts of data needed to power ERP.

Judging from these results, most companies with sufficient resources should position current data-intensive IT projects so that they can begin using the power of the cloud once the technology matures and the security concerns have been addressed.

 

Entries for Week 1: March 2, 2009 through March 8, 2009

Why a CSS Website Layout Will Make You Money

Trenton Moss1 of Webcredible gives us four good reasons to make sure companies use CSS to layout web pages instead of relying on the use of HTML tables:

  • CSS layouts require less code than equivalent HTML table layouts. The reduction in code results in less bandwidth used during site traffic.
  • CSS layouts make it easier for search engines to index a site. The search engine does not have to parse as much code and the most important content is easier to put at the top of the document.
  • The reduction in the amount of code used in a CSS layout results in faster download and rendering speeds for the end user. When tables are used for layouts, the browser must receive all table data before fully rendering the content.  CSS layouts render as they are received.
  • CSS layouts can be device-agnostic, whereby the same HTML structure can be used to generate layouts for computer screens, cell phones, and PDAs. With traditional table-based layouts, totally separate HTML structure version had to be created.

The Beauty of Simplicity

Linda Tischler2 of Fast Company Magazine investigates how making technology products easy to use provides a huge competitive advantage to companies who can figure out how to do it.  Tischler makes several keen observations:

  • Users say they want a ton of feature, however, they are much happier with products that do fewer things but do them very simply and very well. The bottom line is that users just want the products to work, with minimum fuss.
  • Creating simplicity from technology products is difficult. The products often employ complex science that needs to be presented behind the facade of simplicity.
  • Powerful, simple-to-use products come from companies who are committed to the idea of the simple user experience at all levels of the company. Everyone from top management down to the product testers must feel that simplicity is very important.
  • A major problem with many products now is that adding features often does not add cost because the features are added via software. The iPod is successful not because of its hardware, but how the software presents the hardware's functionality. Just because you can do something does not mean you should.
  • Google's director of Web products, Marissa Mayer, only puts links to products on Google's front page that users have shown (through traffic) are the most useful to them. Less popular products are still available, but tucked away so as to not get in the way of what most users want.
  • Embracing simplicity can lead to big returns. Management at electronics giant Philips has dug itself out of a slump by making sure all of its products are easy to use. Everything from packaging, menus, remote controls, and manuals must pass multiple simplicity metrics before it can be used in a shipping product.

Web 2.0 Has Corporate America Spinning

BusinessWeek's Robert Hof3 examines the impact that Web 2.0 implementations are having on big corporations. Some of his observations are:

  •  Web 2.0 sites help users "get something done."
  • Employees of corporations are now using blogs to communicate with customers. This gives corporations a face and allows customers to develop emotional attachments to the brand.
  • Companies like Disney are using wikis to enable departments and product development groups to maintain up-to-the-minute documentation of product developments which are contributed to by the entire team. This has resulted in very responsive and agile groups.
  • Social Networking sites like LinkedIn are being tapped to find sales leads as well as help with staffing.
  • Web 2.0 sites are focused on the power of the collective, whereby information that is useful to one person may be useful to more people. Within the context of corporations, this is helping all employees using a service to help other employees that use the same service.
  • Web 2.0 applications have the ability to provide free PR: Sites like Technorati can call attention to a corporation's content, all for free. All the company has to do is make sure its content is appealing and relevant to its customers.