[Solar-talk] validation - unicode?

Paul M Jones pmjones at solarphp.com
Sat Jul 22 09:20:06 PDT 2006


On Jul 22, 2006, at 10:42 AM, Jeff Surgeson wrote:

> Bewteen the 100 - 500 and the 35 - 60 are these extended minus  
> signs, I can
> display them, copy them but cant type them as they do not exist on my
> keyboard layout. As far as I know it has something to do with windows
> locality options. My col scheme is standard, validates for length  
> (I am
> learning) and accepts only letters/numbers and punctuation, and  
> wont allow
> blank.
>
> $this->_col['title'] = array(
> 	'type'      => 'varchar',
> 	'size'      => 64,
>         'require'   => true,
>         'valid' => array('multiple', 'VALID_TEXT_64',
>         	array(
>                 	array('maxLength', 64),
>                     	array('regex', '/^[[:alpha:][:digit:]
> [:punct:]\s]+$/', 'NOT_BLANK'),
>         	)
> 	),
> );
>
> This all works fine until someone enters a strange character, which  
> is going
> to happen.
>
> How do I deal with it as far as Solar is concerned.

Some things to try:

1. At the form level, you can add a 'filter' rule that takes non- 
alpha/digit/punct/space and strips them.  See the docs for  
Solar_Filter, and the related GettingStarted page in the Solar_Form  
docs.

     http://solarphp.com/index.php/docs/read/Solar_Filter
     http://solarphp.com/index.php/docs/read/Solar_Form/GettingStarted

2. At the table level, you may wish to redo your validation regex so  
that it allows for those characters using \xNNN notation (based on  
the character code ranges; I have no idea what they are).  
Alternatively, if the field is a free-form entry line, you might want  
to take the regex validation off entirely and leave only the basic  
checks (maxLength, notBlank).

3.  In your table-level regex, try using the undocumented /u pattern  
modifier for Unicode, which may cause [:punct:] to recognize the  
additional high-code punctuation marks (don't know if that'll work  
for sure, but it's a thought).  See this comment for more info:

     http://us3.php.net/manual/en/ 
reference.pcre.pattern.modifiers.php#54805



> What worries me was that I noticed that you make use of very little  
> validation
> except where it is a predictable entry like a email/ipv4/ single or  
> muti
> words, numbers only. But if large block of unknown entry like a  
> description,
> you just type it as "clob" with no validation.
>
> Is this because of the charset problems and the problem I am having?

Kind of.  If a field is free-form, then you can't really validate it;  
there's no pattern for it to match against.  CLOBs in particular are  
like that.


> Rodrigo's post talks
> about "convert all existent locale files to UTF-8" what exactly do  
> you mean?
> I know this has nothing to do with form validation but as far as I  
> can make
> out it has to do with unicode/utf-8/charsets etc.

When I was using BBEdit, it defaulted to a charset other than UTF-8  
(iso-latin-8, maybe? don't recall).  That meant some high-code  
characters weren't being saved properly, and were distributed in  
their corrupted forms.  Rodrido moved them all to UTF-8 and now that  
I use TextMate (which is UTF-8 by default) it's no longer an issue.   
Does that help?


> My limited knowledge tells me that the change to unicode solves all  
> the
> different locale / character code problems, not so?

So far as I can tell, this is the case, yes.

Does this help at all, or have I missed an important point in your  
questions?



--

Paul M. Jones  <http://paul-m-jones.com>

Solar: Simple Object Library and Application Repository
for PHP5.   <http://solarphp.com>

Savant: The simple, elegant, and powerful solution for
templates in PHP.   <http://phpsavant.com>




More information about the solar-talk mailing list